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FOREWORD 


The great potential of adaptive communications to all areas of the National 
Space Program, the growing need for more economic data transmission systems, 
and the lack of any previous unified action to bring developers and users together 
led to the Conference on Adaptive Telemetry, sponsored by NASA and held at 
Goddard Space Flight Center on February 15 and 16, 1966. This document con- 
tains those conference papers which were available at the time of printing with 
abstracts of unavailable papers appended in the interest of completeness. 

The Conference proceedings were arranged into four sessions: 

I Adaptive Prediction and Data Compression 
II Adaptive Data Control and Processing 
m Adaptive Encoding, Decoding, and Modulation 
IV Adaptive Telemetry — Problems and Limitations 

There was no attempt to compartmentalize the sessions precisely and re- 
strict a topic — Data Compression, for instance — to a single session. Also, 
unfortunately, it has not been possible to capture the additional information and 
interpretations which came out in the discussion periods at the end of each 
conference session. 

This conference was the first, to our knowledge, devoted solely to the sub- 
ject of adaptive telemetry. Although the subject is relatively new and is being 
investigated by a relatively small number of researchers, the conference was 
enthusiastically supported and attended and offered a unique opportunity for the 
clarification of concepts and crossfertilization of ideas. Obviously, the work 
reported in this field had been done prior to the conference, but results had not 
been given wide circulation or close correlation. 

From the standpoint of the quality of the papers, the wide representation of 
the participants, and the interest stimulated, the conference was very rewarding. 
The more significant results were (1) an exchange of views on the fundamental 
concepts of adaptive telemetry, (2) a better delineation of the various tech- 
niques encompassed by the term "adaptive," and (3) the identification of adaptive 
features in existing and proposed hardware implementations. The papers pre- 
sented were fairly evenly divided among these objectives and the theme of the 
conference proceeded generally in that order. It is hoped that interest and 
work in adaptive telemetry will grow and important advances in the fields of 


theory, techniques, and applications will be made. In particular, it is hoped that 
this conference has stimulated the interest of other potential users with whom 
we wish to open and maintain a close communication. 

The original idea for a conference which would bring together together the 
proponents of adaptive telemetry was generated by Dr. Balakrishnan and his 
associates at UCLA about a year ago. A proposal to hold this conference at 
Goddard Space Flight Center received the immediate and enthusiastic support of 
Dr. John F. Clark, Director of the Center. Following the decision to hold the 
conference, strong support and valuable guidance was received from Dr. A. V. 
Balakrishnan at UCLA, Dr. L. D. Davisson of Princeton, Dr. J. C. Hancock of 
Purdue, Dr. J. Weber of USC, and researchers at Stanford University and SRI. 
From private industry, help was received from Dr. R. W. Sanders of Space 
General, Mr. L. Gardenhire of Radiation Inc., and research personnel of Astro- 
Power, EMR, and Hughes. 

The number of those who participated in the conference, either as attendees, 
registrants, or in a more active capacity, exceeded 125 persons. Twenty- six 
papers were presented orally. The breadth of the participation in the conference 
can be seen by noting the affiliations of the authors. Representatives of six 
private companies presented papers. Five papers came from universities, six 
from the Jet Propulsion Laboratory, and nine from the Goddard Space Flight 
Center. 

It is a great pleasure to acknowledge the support and contributions of the 
many people who helped to plan and conduct the conference, presented papers, 
and prepared these Proceedings. Particular credit should go to Dr. Balakrishnan 
who was cochairman of the conference, to Docters Benn Martin, George Ludwig, 
and Robert Rochelle who chaired separate sessions, and to Mr. Alfred Shehab of 
the Goddard Educational and Special Program Office and Mr. Charles Laughlin 
of the Systems Division who helped with the organization of the conference. 


R. A. Stampfl 
Greenbelt, Maryland 
July, 1966 
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WELCOME 


CONFERENCE ON ADAPTIVE TELEMETRY 


I regret that I will be unable to join you in what appears to be an interesting 
and timely meeting. However, I have reviewed with Dr. Stampfl your agenda for 
this two-day conference and I am pleased to note your interest and concern with 
the most effective management of large quantities of scientific data — an area 
of great importance to Goddard Space Flight Center. 

Goddard is certainly a very logical place for this conference. Because of 
our assigned missions, we were among the first to be faced with these data 
management problems. Over the past 7 years, NASA has succeeded in putting 
into orbit dozens of satellites, both manned and unmanned, in order to learn 
more about our earth's natural environment so that we can apply the knowledge 
gained for the benefit of all mankind. This effort has involved the development 
of launch vehicles, spacecraft, and tracking and data systems. Literally hun- 
dreds of different instruments have been employed in space to acquire these 
data and to return them to Earth. The task of eliminating unnecessary redun- 
dancy from these data before transmission and their collection and processing 
for use by the scientific investigators are of vital concern to all of us involved 
in the management of these important space flight missions. Today, since we 
are confronted with the necessity of handling efficiently the constantly increasing 
data rates with essentially fixed resources of personnel and funds, the most 
competent resolution of the difficult problems involved is of the utmost im- 
portance. 

I am therefore delighted that this conference will give the Goddard staff an 
opportunity to be exposed to your ideas and thinking, while you gain further in- 
sight into our needs and problems. Please accept my best wishes for a pleasant 
and fruitful meeting. 


John F. Clark 
Director, GSFC 
Greenbelt, Maryland 
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OPENING REMARKS 


* 


SPACE EXPLORATION'S CHALLENGE 
TO ADAPTIVE TELEMETRY* 


Although the National Space Program has not suffered signi^cant^ to ‘dale ^ 
from a limited satellite data transmission capability, there is hardly any area 
of the future program that cannot benefit from the application of adaptive 
telemetry. Many of those attending this conference are not only contributors to 
the discipline but also potential users. The most difficult problem for the user 
is the rigorous definition of the information which is essential to him. Once 
this goal is attained, vast quantities of data may not be needed, results of re- 
search will be available more quickly, and the search can proceed to the next 
phase. An adaptive system, by implication, is useful for many different sources 
of information and the constraints on, or assumptions about, the source are few. 

This is a vital characteristic from the users point of view because the exact 
nature of his data source is unknown. It is here where our enthusiasm for the 
art should be tempered with restraint, for the limits of adaptive telemetry may 
cause rejection of the entirely unexpected, and thus deprive us of discovery of 
the unanticipated. 


GENERAL CONSIDERATIONS 


Telemetry systems need not always be designed on a worst case basis. If a 
worst case situation can be presumed to be a transitory situation, then advan- 
tage should be taken of the intervals during which less severe requirements are 
placed on the system. In addition, telemetry systems are often called upon to 
handle highly redundant information sources. In such a case a telemetry system 
designer is obligated by the demands of other sources to take maximum advan- 
tage of this redundancy. Often, the ultimate usefulness of telemetered data is 
time and level dependent. This is definitely a nonquantitative aspect which 
transcends all theoretical information concepts; none the less, it is highly im- 
portant. It is not difficult to provide examples where the transmission of un- 
necessary data (due to the inability of a system to adapt to changing conditions 
or requirements) seriously limits the overall effectiveness of a space mission. 
The more subtle advantages of adaptive techniques apply to virtually every 
future space program. 


* A consolidation of the opening remarks of Dr. Balakrishnan, UCLA, and Dr. Statnpfl, GSFC, 
chairman of the Conference on Adaptive Telemetry. 
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The need for transmitting only the most meaningful information is drama- 
tized by the present and foreseeable telemetry limitations at planetary distances. 
With reasonable ground facilities, an 8-foot antenna reflector would permit less 
than 50 bits per second to be transmitted to Earth from a spacecraft at 5 astro- 
nomical units distance. Even with a 30-foot antenna reflector, which can reason- 
ably be expected for space flight, this figure would only increase to a few 
thousand bits per second. Since such a telemetry system cannot be allowed to 
be choked up with redundant data, some form of adaptivity will be essential. 

Closer to our planet, the time is already here when literally unbounded 
quantities of data can be collected continuously from satellites at geosynchronous 
altitudes. Most of Earth will soon be under continuous observation so that sen- 
sors not only can detect a multiplicity of parameters with high angular resolu- 
tion but also can collect and distribute other kinds of data. 

Optical sensors, like television cameras or infrared scanners, are usually 
quoted as data source examples because of their large bandwidth requirements. 
They can also be cited as the outstanding example of the disparity between the 
number of bits to be transmitted and the actual useful information content in the 
delivered image. As is well known, for weather observation only gross cloud 
patterns are fed into the prediction process. So far, it is only natural that 
adaptive compression techniques be applied to these data sources. 

Improvement in weather predictions will depend upon progress in adaptive 
telemetry when large numbers of atmospheric soundings are to be performed 
continuously. The boundary conditions derived from these soundings are needed 
to update solutions of the atmospheric prediction equations. The economic bene- 
fits of accurate weather predictions that are valid for many days or perhaps weeks 
in advance are obvious. 

As another example, the air traffic control problems in the next decade can 
be mentioned. The solutions to these problems will require more than just 
highly sophisticated versions of present technology. This is due largely to the 
characteristics of the supersonic transports currently under development. To 
be economically justifiable, these aircraft will have to be airborne virtually all 
of the time; thus, ground maintenance and turnaround times will be of paramount 
importance. Therefore, telemetry during flight will be needed to diagnose air- 
craft for servicing; also, many additional data will have to be collected for cen- 
tral processing. Telemetry systems aboard the supersonic transport will have 
much in common with large spacecraft telemetry systems. Extrapolating from 
these, it can be readily appreciated that adaptive processing systems having the 
ability to meet changing situations will make the necessary diagnoses and evalua- 
tions available faster and more economically. 
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Beyond this, supersonic transports must have complete and reliable atmos- 
spheric weather information before takeoff in order to obtain optimum cruising 
altitudes and routes to planned destinations, so that they can avoid the heavy 
economic penalties involved in carrying excess fuel. 

ADAPTIVE TELEMETRY CHARACTERIZATION 

In a broad sense, any system capable of changing or adjusting to meet the 
requirements of a different condition can be called adaptive. Accordingly, any 
self-adjusting mechanism (such as AGC or AFC systems) would be included as 
well as other conventional open or closed loop feedback systems. Indeed, the 
entire field of biological systems, including man, would be included with the 
overriding criterion of self-preservation. Such a broad classification is not 
very useful for the purposes at hand. However, if we add the requirement that 
the systems of interest must modify themselves as the result of learning we are 
left with a much more restricted and interesting class of systems. 

In particular, the specification and design of optimum (from the standpoint 
of a posteriori probability) Telemetry receivers are highly developed areas of 
engineering. That is, for nearly any given set of conditions, an optimum re- 
ceiver can be hypothesized, analyzed, and at least nearly realized in practical 
form. In addition, adaptable receivers — that is, receivers that can adjust to 
optimum configurations for different sets of a priori conditions — are entirely 
practical. Such receivers imply an ability to make estimates on certain system 
parameters and to make adjustments in the decision structure which utilizes 
these estimates. If it is assumed that the system parameters are changing in an 
a priori unknown fashion, then such a receiver must be able to learn the condi- 
tion of these system parameters; this requires the measurement or monitoring 
of performance against given criteria. 

There are then three operations which can be required to delineate a more 
restricted as well as a more important class of adaptive telemetry systems. 

They are as follows: 

(1) Monitoring the system performance by measurements against given 
performance criteria and relating these measurements to certain system 
parameters. Here the monitoring procedure refers to deriving certain 
parameters from either the data source at the transmitter or the ob- 
served data at the receiver for application to the learning process. 

(2) Learning or estimating the states of certain channel parameters and 
evaluating the quality of the received information. Here the estimation 


5 



is derived from the monitoring signals and a learning process may be 
employed to change prior knowledge at the receiver. 

(3) Modifying or adjusting parameters in a way which optimizes the re- 
ceiver structure as instructed through the learning process. 

It should be observed that these requirements do not restrict the location of 
the functions. Thus, adaptive procedures may be applied at the transmitter 
location, in which case encoding equipment may be adapted to accommodate the 
changes in the data source. Alternately, the adaptive procedures may be applied 
at the receiver location, in which case decision functions may be adapted to ac- 
commodate changes in the transmitter operation or changes in the transmission 
medium. A combination of adaptive procedures can be applied at both the trans- 
mitter and receiver locations with or without feedback. In the latter case, the 
ultimate user of the telemetry data might well be included in the system, per- 
haps to perform part of the learning and modifying functions. 
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1. ADAPTIVE METHODS IN COMMUNICATION SYSTEMS 


A. V. Balakrishnan 

University of California 
Los Angeles, California 

N67-27403 

There has been much activity in the past concerning adaptive control 
systems; however, adaptive methods in communication theory are of more re- 
cent origin (References 1 to 4). The introduction of adaptive methods is a 
measure of sophistication in systems design made possible by the availability 
of high-speed large-scale digital computers. In communication systems (as 
well as in radar and data-processing systems generally), Earth-based receiver 
complexes already include fairly elaborate computing facilities, often as an in- 
tegral part of the receiver, as, for example, in data transmission systems or 
space communication systems. The addition of some extra logic or arithmetic 
operations to accommodate adaptive methods is no longer a source of major 
difficulty. In fact, processing capabilities are already outstripping extant theory 
in many areas outside of communications, such as in geophysical prospecting, 
that can take advantage of them. In short, the need for adaptive methods cannot 
be overemphasized for optimal operations that are free from preset constraints 
and can adapt to changing data or system conditions. 

The present paper treats the optimization theory for adaptive systems that 
involve the transmission and/or reception of information. After examining the 
features that define adaptivity, three specific applications are discussed: The 
adaptive feature in the transmitter, that in the receiver, and that in both re- 
ceiver and transmitter. In the first case, a PCM transmitter is designed to 
minimize average transmitter power directly by adapting it to the probable state. 
The second application is a diversity receiver which seeks to minimize error 
by adapting to various signal strengths. The third is an adaptive data compres- 
sion system that exploits data redundancy by utilizing an adaptive data predictor; 
the band-width reduction is initiated only if the data warrant it. 

These examples also serve to illustrate the problems that still remain be- 
fore a general theory of adaptive optimization can evolve. One of the main diffi- 
culties is the quantitative evaluation of adaptive systems. 

DEFINING FEATURES OF ADAPTIVE SYSTEMS 

An adaptive system is distinguished primarily by a learning mode that is 
either explicit or implicit. However, to be truly adaptive it must have other 
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features as well. The following interpretations of dictionary definitions serve 
to illustrate this point: Adaptive implies modification to meet new conditions; 
and adjusting implies a bringing into as exact or close a correspondence as 
exists between the parts of a mechanism, but suggests more tact or more in- 
genuity in the agent. Actually, it is impossible to draw a sharp distinction be- 
tween what is adaptive and what is not. The word adaptive does signify some- 
thing more than self-adjusting mechanisms (such as an AGC or AFC system 
where there are no conditions not envisaged at the time of design). Without 
getting involved in semantics, one can say that an adaptive system is one which 
monitors its own performance ; when new conditions arise that degrade the per- 
formance, the system learns what they are and changes its structure accordingly. 

In a communication system the new conditions might stem from a priori 
unknowable channel parameter changes. These parameter changes have to be 
learned or measured and appropriate changes must be made in the system to 
maintain a prescribed level of fidelity, or other chosen criterion. Let us con- 
sider some simple examples. The usual Shannon- Fano noiseless coding for 
minimal length needs a complete specification of the message probabilities. 

These probabilities are unknown and have to be measured. Moreover, if the 
probabilities are also subject to change, there must be a learning phase, whether 
it is separate from the coding phase or not. One cannot call such a system 
adaptive since no monitoring of the performance is involved. In a two-way or 
feedback communication system (such as a data transmission system) the re- 
ceiver monitors the quality of performance (with an error-detecting code) and 
the transmitter adjusts by reducing the transmission rate or stopping the trans- 
mission altogether. Since there is no learning involved, this is not quite adap- 
tive by the definition given above. The learning phase is triggered by a degrada- 
tion in performance. A radar system that simply measures the clutter or 
reverberation parameters and then changes the detection circuitry parameters 
is not quite adaptive either, since there is no self- monitoring criterion. If the 
transmission system learns or measures channel characteristics and changes 
the mode of transmission accordingly to restore the performance quality, it 
would be a truly adaptive system. 

Actually, the term "adaptive" is beginning to be used whenever there is a 
learning or measurement feature (implicit or explicit) even if there is no self- 
monitoring. (The terms "self- regulating" or "self-adjusting" are used for 
systems with the monitoring feature but no learning feature.) 

This paper is concerned with only learning or adaptive systems. In keeping V 

with this, adaptive communication systems are categorized as: (1) Transmitter 
adaptive (adaptive methods at the transmitter) , (2) receiver adaptive (adaptive 
methods at the receiver), and (3) both receiver and transmitter adaptive. The 
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classification is based upon whether the primary adaptive mechanism is at the 
transmitter or at the receiver, even though a measure of complementary adap- 
tivity may be involved at both ends. The discussions that follow are intended to 
illustrate the formulation of the problems that arise. 

TRANSMITTER ADAPTIVE SYSTEMS 

An adaptive system in which the major adaptive function is at the transmitter 
(but without the performance monitoring feature) will be considered first. A 
binary source is considered in which successive bits are statistically independent 
and the need is to transmit the information coherently using, as usual, A t cos a> c t 
or A 2 cos (a> c t + <9), 01 t 1 T, If we use a coherent receiver and the channel 

noise is characterized by additive white Gaussian noise, it is readily seen 
(Reference 5) that 6 = tt should be chosen without regard to A x and A 2 . It is 
also assumed that state one has probability P l and that this is known to the 
transmitter and receiver. Where T >> 1, the average transmitted power is 

Ai 2 A 2 2 

P i T + P 2 1" • t 1 ) 

The probability of error when the optimal receiver is used is an involved prob- 
lem because, ordinarily, Pi = 1/2 in the optimum receiver design. Among the 
reasons for this, the all-important one is that the receiver cannot do better than 
to minimax; hence the receiver threshold is (Ax - A 2 )/2 and the probability of 
error is decreased by increasing A x + A 2 . It is then assumed that Ax + A 2 is 
fixed to provide a desired error rate; e.g. , let Ax + A 2 = A. The transmitter is 
designed to provide the minimum output power consistent with the minimum ac- 
ceptable error. The average power given by Equation 1 can be minimized for a 
fixed value of A by taking A x = (1 - Pi) A and A 2 = Pi A, yielding the minimum 
1/2 Px(l - Pi) A. Such a selection is obviously in accord with one's intuition. 

Since more power is provided for the less likely state, the power reduction can 
be substantial. A closer examination of the notion of average power is now in 
order. A phase-average has been taken and, thereto.,, e, the law of large numbers 
must be invoked to interpret it as a time average 


1 

nT 



A(t) 2 dt 




E(A ,2). 


By the approach described the average power is reduced while the same error 
rate is maintained. The only difficulty is that the probability P x in any 
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practical communication system is unknown and must be evaluated en route. 

Here the system has to adapt to the particular signal structure which cannot be 
predetermined. If the view is taken that the probability P 1 has to be evaluated 
first and the amplitudes A t chosen accordingly, then the problem of determining 
what constitutes an optimum estimate or, more properly, what constitutes an 
optimum procedure arises. It is clear that it is better to seek the overall opti- 
mum directly. If it is assumed that the decisions are to be based on an N -chain 
of digits Sj, where Sj = 1 or 2, and (A a ) denotes the chosen amplitudes (based on 
the past N digits so that A n = f (S, . . . ,S n - N ), where A = f (1, S n _! , . . ., S n - N ) 

+ f (2, S n - i, . . ., S n - N )), then to minimize the average power, when the message 
probability structure is known, 

E ( A - 2 ) = £[*(*.».- s »-») 2p . 

+ f(2. S„_, (1 - P,)] P(S„., S„. N ). 

Here Pj =P(l/S n _ 1 l/S n _ N ) yields the optimum choice, 

'M.-, S„_ B ) = A [l - P (l/S n _ 


When the bits are independent, this agrees with the heuristic choice dis- 
cussed above. The main difficulty is that these probabilities are unknown and 
must be estimated. To proceed in this case, it is assumed that there is an un- 
known parameter (multidimensional, perhaps) such that 



S n-N / de 



where p( S n , ... S n _ N 0) is known for every value of 6 , but the distribution p(6) is 
unknown although negotiable (in a manner to be discussed). If it is assumed that 0 
is a random variable, and the N -chain [ S n _ S n - 2 » • • •» S n -Nl i s denoted by the 
vector S, then the dependence on 0 can be shown by 


E 




f ( 1, S) 2 P(l, S/0) + f ( 2, S) 2 P ( 2, S/0) ] p{6) d0 
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■ B« 1, S) 2 P(l/S, 9) + f ( 2, S) 2 P ( 2 /S, 0 )] P(S/0) de, 


so that the optimal choice can be expressed 


JP( 2, S/fl)p(g)dg 
JP(S/9)p(9)d9 


and 


Jp(i, s/e)p(d)de 

jP(S/9)p(9)d9 


The similarity of the present problem to decision-theoretic problems allows 
the methods and procedures of decision theory to be utilized in making an evalua- 
tion of p (9). First, it is noted that the optimal choice can be written 


and 


■1 


f ( 1, S) = A P(2/S,0)P(0/S)d5 


'I 


f ( 2, S) = A P(l/S, 0)P(0/S)d0. 


These are conditional expectations of the probabilities P(2/S, 9) and P(l/S,i9); 
hence, they are also the best estimates of these quantities (in the mean square 
sense). In other words, when the probabilities P (1/S) and P (2/S) are not known, 
the theory recommends the use of the best mean square estimates, however the 
choice of p (9) may still be made. Maximum- likelihood estimates can be used 
which, in a sense, eliminate the a priori density problem. Thus, by utilizing the 
value of 9 that maximizes P(S /9), say B, 

f (1, S) = P 2/S, 6). 


For large values of N (from the asymptotic theory) it is expected that these 
various estimates will not differ much. Thus, the example is considered where 
the successive bits are independent for each value of 6, so that 
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f (2, S) 


/0P(S/g)p(g)dg 

;p(s/5)p(0)d0 


If 0 1 p(d) <, 1, 0 1 6 < 1, and P(S/9) = d v (1 - 6 ) N_V corresponding to v 'one' 
states is chosen, then 


f (2, S) 



<9 V + 1 ( 1 - 0) N “ V d6 


e v ( i - 0) N_v d& 


V + 1 

N + 2 


On the other hand, if the maximum likelihood estimate is used, f (2, S) = v /N, 
which substantiates our observation. 

The minimax estimates can also be obtained as in decision theory, but due 
allowance must be made for the fact that this may be overly pessimistic. Again, 
asymptotically, many of these estimates may be equivalent, but it is not clear 
what the practical implication of asymptotic optimality is. 

It should be noted that successive digits were assumed to be independent. 
This is of course an ad hoc assumption as is the parametric model. Clearly, 
little more than a statement of the problem has been achieved. Since the object 
here was to point out some of the general features and some of the theory that 
needs to be developed, the groundwork has been laid for the next example. 

RECEIVER ADAPTIVE SYSTEMS 

This second example considers a system in which the adaptive learning 
feature is at the receiver, but, as with the previous example, the nature of the 
theory involved will be treated without going into specific details. For this ex- 
ample a PCM system employing "space diversity" (i.e., several receivers with 
n of them tuned to the same transmitter) will suffice. In this system the re- 
ceived signals at the i th station can be expressed by 


f i (t) = k { s(t) + N i (t), 0 < t < T, 


i - 1, . . . , n , 
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where N, (t) is the noise (for instance additive Gaussian) at each receiver, s (t) 
is the information-carrying waveform 

s ( t ) = s k (t), k = 1, .... M 

which is known at each receiver, and k l is the amplitude factor which may vary 
from receiver to receiver. Where the (ki), or the ratios of the various ki’s 
are known, the optimal receiver characteristics are derived that minimize the 
probability of error and use all of the received signal (f i (t)) . The point is that 
it is possible to do better by a suitable combination of the f i(t) than by individual 
processing (detecting) at each receiver and then using some kind of majority 
logic to decide on which of the signals ( s k (t)) was transmitted. The optimal 
receiver structure can be easily derived by considering f i(t) as a "multiple" 
process and setting 


F(t) 

= Co1 • • •• f n (O] - 


S(t) 

= col [kj s(t), k 2 s(t), . . . 

• k n S ( t )] ’ 

S 4 (t) 

= col [kj s 2 ( t ) , k 2 s. (t), . 

' ** k n S i (t)] 

N(t) 

= col [Nj ( t), . . ., N n (t)] . 



If we let R ( s, t ) be the covariance matrix function, then 


R( s, t) = E[N(s)N(t)*] 


where the ( )* denotes a conjugate transpose. If it is assumed that the signals 
s k (t) are equally likely, it can be shown (Reference 6) that the k*h signal should 
be chosen if 



h i (t)*S. (t)dt 



F(t)*h i (t) dt 


f 

Jo 


h k (t)*S k (t)dt 


Jo 


F(t)*h k (t)dt , 
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where 


f R(s, t)h.(t)dt = S i ( s). 
•'o 


The simplest case occurs when 


R(s, t) = <DI S(s - t), 


where I is the unit matrix and corresponds to the case where the noises are 
pure white and independent with the same spectral density. A somewhat more 
general case is 


R( s, t) = DS(s - t), 


where D is a diagonal matrix with positive entries which, perhaps, correspond 
to different noise temperatures at the receivers. In this case 


Mt) 



k 2 

s i (O' "5“ s i (O' 
U 2 



d 

n 



where (d^ are the diagonal elements of D. Also, if it is assumed that all signals 
have the same power and are consistent with the assumption that all signals are 
equally likely, then 



(t)*S i (t)dt 



PT , 


where 


_1 

T 



( t ) 2 d t 


P. 


16 



By substituting these results into the decision rules, the expressions reduce to 


max I” f(t)*D *S. (t)dt - f f ( t ) *D _ 1 S k ( t ) d t 
1 *'o Jo 


or 




In other words, each receiver waveform f i (t) has to be weighted by the corre- 
sponding signal-to-noise ratio k±/d i . 


For the adaptive feature, it is noted that the (ki) or their ratios in a diver- 
sity system, are usually unknown, or are subject to variation. Since the noise 
temperatures are usually known, the (di) are known. Thus, a learning phase is 
included where the (kj) are estimated and then an "adaptive" phase is included 
where the weighting factors are adjusted accordingly. Such a problem for the 
case of binary signals and white noise has been treated by Price (Reference 1). 
One such method of estimation is the case where it is possible, by a suitable 
operation at the receiver, to change Si (t) to one parameter, regardless of what 
i may be. For example, if the system is polyphase (one fixed phase angle 2v/k , 
where k = 1, . . ., M corresponds to each signal), then, by multiplying the phase 
by the factor k, the modulation uncertainty is removed. 


The problem of estimating the parameters (k i) by "maximum likelihood" or 
similar criterion remains. In general, if it is assumed that all the waveforms 
Si (t) are equally likely and that the ( k i ) are mutually independent, then by maxi- 
mizing the likelihood function the setting for each i , becomes 



or by taking 
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Jo [ fi ^ ~ k s j <*>] s j (*) dt ex pj~ 2T J [ f i (t) - k s. (t)] 2 dtj- 0 


with 


f-, 

Jo 


( t ) 2 d t = PT, 


the setting becomes 


r r 


k f T 

f.(t)s (t)dt 

exp 

+ d“ f i (t) S i ( t > dt 

_ 0 


_ 1 Jo 


v -1 r k f T 

= k > pTex p — f i 

t^i L 1 J o 


(t)s j ( t )dt 


Actually, the adaptive problem has just begun. An optimum estimate of (ki) 
should be based on a minimum probability of error by using any a priori statis- 
tics available. Any optimum procedure devised would have to be evaluated in 
terms of the probability of error that results. Extension to noncoherent trans- 
mission is a natural one since practical systems utilizing adaptive estimation 
(such as optical communication systems where diversity can be of use to offset 
the effect of multiple scattering) would be noncoherent. 

ADAPTIVE DATA COMPRESSION SYSTEM 

As a final example, a system is considered that is adaptive at both the 
transmitter and the receiver. This system is more truly adaptive in the sense 
that it features a self-monitoring function in addition to the learning and ad- 
justing functions. In contrast to the preceding section, nonparametric methods 
will now be used. 

Let x(t) represent a continuous or discrete parameter source. It is cus- 
tomary to assume that x(t) can then be regarded (at least for analytical purposes) 
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as a stochastic process. If the time-parameter t is continuous, then it is 
possible to assume that the expression 

x ( t ) = I” e 27rift d«/»(f) 

**-B 

is adequate (depending on whether we adopt the stochastic or nonstochastic 
viewpoint) for a sufficiently large bandwidth B. By the well-known "sampling 
principle" one can represent the continuous waveform by using periodic samples 

taken at t = 777: . In most communication systems using sampled data of this 

kind, the sampling rate is determined by the nominal, highest, or cutoff frequency 
B expected in the data. However, in many kinds of data (such as in space teleme- 
try data) the actual cutoff frequency will be small most of the time, so there is 
no need to sample at the nominal rate of 2B; that is, it is possible to "compress" 
the data sampling rate or channel bandwidth. A method of achieving such com- 
pression in PCM systems is to exploit the redundancy or predictability of the 
data. Thus, a "predictor" is employed at the transmitter which predicts the data 
at time t + A based on the data up to time t. The actual value observed is then 
compared with that predicted. If the difference exceeds (in absolute value) a 
preset threshold, then the actual sample is transmitted. If it is below the 
threshold, a prearranged code word using one digit could be transmitted instead 
of the m digits; this reduces the number of digits transmitted per second. A 
comma-free code may be used to sort the two kinds of words unambiguously. 

The adaptive or learning feature is part of the predictor mechanism. The pre- 
dictor is not operative until the threshold is exceeded and thus exhibits the self- 
monitoring feature. The predictor itself is based on a learning phase and the 
prediction operator is adjusted accordingly. For details on the predictor itself, 
see Reference 2. The significance of the adaptive prediction feature lies in the 
fact that no a priori statistics or other assumptions concerning the data are re- 
quired and, in particular, that there may be periods in the data where prediction 
(to the quality set) may not be possible. 

The basic prediction philosophy may be indicated briefly. A waveform of 
duration T (a function x(t), 0 A t < T) is given and the prediction of the value at 
time T + A is required. Any prediction operation is to be based on this data 
alone since no additional a priori knowledge is available. This is, of course, an 
ancient problem; however, we wish to treat it strictly in the telemetry data 
processing context and note two major points of view in dealing with it. One 
which may be considered the "numerical analysis" point of view consists in as- 
suming that any physically realized waveform must be analytic and can be ap- 
proximated by polynomials. The data may thus be "fitted" to a polynomial of a 
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high degree and then used to "predict" the future values. The other and more 
recent view is the statistical view in which the data are assumed to be a finite 
sample of a stationary stochastic process whose average properties, such as 
moments and/or distributions, are known or calculable from the data. For a 
process of a given description, the well developed mean square prediction 
theory can be applied. Perhaps the main advantage with this view is that it gives 
a quantitative, although theoretical, notion of the error in prediction. The prob- 
lem of measuring the average statistics from a finite sample can be quite delicate, 
however. In the polynomial fitting method, the interpretation of the prediction 
error (which is the crucial point) is linked with what degree polynomial to be 
used and what portion of the data is to be fitted and is therefore more nebulous. 
Moreover, as ordinarily used, the fitting operations on the data are linear. 

A rationale is adopted for prediction which is free from a priori assump- 
tions concerning the "model." In a general sense, what is involved in both of the 
above methods is (1) "modelmaking", consistent with the data and (2) using the 
numbers derived from the model to perform some optimal operations. If an 
understanding of the mechanism generating the model is desired, the first step 
is essential. If prediction is wanted, then it shall be shown that it is possible to 
proceed directly (and hence more optimally) to the best prediction without the 
intermediate step of modelmaking. It may be, of course, that several philoso- 
phies lead to the same operations on the data. Even here, the present method 
offers some practical advantages. It is only natural to use the philosophy that 
requires the least number of a priori assumptions. 


Any prediction is an operation or operator on part of the data, and in this 
case the finite part is all that is available. The main point of departure in the 
view being taken is that if a prediction operator is obtained which, based on all 
the available input data, functions optimally in the immediate past relative to the 
point where the prediction is required, the most meaningful solution to the pre- 
diction problem is achieved. Thus, the only basis on which an aprinri judge- 
ment can be made on any prediction method is to "back off" slightly from the 
present and compare the actual available data with the predicted value by use of 
the given prediction operator. This can be formulated analytically as follows: 
Let the total available data be described as a function x(t), 0 = t ^ T , and let it 
be required to predict the value at T + A , where A is small fraction of t that is 
greater than zero. Next consider the data in the interval 0<t<T-A. If for 
any t Q in this interval a "prediction" of the function value A is made from the 
past values up to t Q , and the predicted value is given by 

x* (t 0 + A) 
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and the error can be explicitly observed as 


x*(t 0 + A) - x(t 0 + A). 

At this point it should be noted that the general prediction operator will be 
a function of a finite segment of the past. If the length of this segment is denoted 
by S, then 


x*(t Q + A) - 0 [x (cr) where (t 0 - S) < a t 0 ], (2) 

where 0 represents the prediction operator. The error criterion must be speci- 
fied next. Here the mean square error has been chosen, because it is simpler 
analytically, is almost universally used, and makes possible a comparison with 
other methods. The kind of solution presented is based on successive approxi- 
mation as opposed to an analytic solution or to other measures of error which 
risk greater complexity. As far as the rationale of the method is concerned, 
this is more a matter of detail than principle. Optimality is determined by 
using 


T - 




x*(t + A) - x(t + A)] 2 dt 


( 3 ) 


where L 2 S, and by continuing the procedure on the basis that the operator o 
which minimizes Equation 3 is the best. Equation 3 can be generalized to 




p(t) C [x* (t + A) - x ( t + A)] dt, 


( 4 ) 


where p(t) is a positive weight function andC ( ) is a symmetric positive cost 
function. Before the problem of determining the optimal operator 0 is consid- 
ered, an important consistency principle should be noted. If the data are re- 
garded as one long sample of an ergodic process, then, in the statistical sense, 
Equation 3 yields the optimal operator exactly, however, it is not necessary to 
make any such assumption concerning the data, nor to compute the average 
statistics first. The point is, that although the data may not be enough for de- 
termining the spectrum, they may be quite adequate for the prediction itself. 
Unlike the polynomial fitting, the operations on the data can be as nonlinear as 
necessary, and at the same time Equation 5 can be normalized to 
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( 5 ) 


e 




x ( t + A) 



thus yielding a quantitative measure of the prediction error by which to judge 
prediction how good the prediction will be. For actual methods of determining 
the optimum predictor see Reference 2. 

In the system, the receiver is assumed to perform a prediction operation 
similar to that performed by the transmitter; that is, the receiver predicts the 
values of the nontransmitted data using the transmitted data as well as predicted 
sections of the data. This would mean that the receiver sets the same level of 
possible prediction operator complexity as the transmitter does. Specifically, 
the transmitter itself has to base its prediction upon using, as necessary, the 
predicted data points that did not exceed the threshold error. 

So far, the optimal adaptive structure has been determined, but the system 
must still be evaluated. It is natural to ask how much compression can be ob- 
tained in this way on the average. Here some structure must be postulated for 
the data source and then the resulting compression must be calculated using the 
adaptive system. The following briefly indicates what an analysis of this type 
involves. To simplify the analysis, consideration of the prediction operation in 
the transmitter is based on the actual samples only. The data samples are de- 
noted by (x n ). It is assumed that the data can be taken as a stationary stochastic 
process. It is further assumed that the process is Gaussian since the maximum 
prediction error (and hence the minimum compression) occurs in this case and 
the prediction operation includes only the linear operations. When the case 
where the adaptive predictor is also constrained to be linear is considered, the 
optimal filterweights aj are determined by minimizing 


1 

TT 


t ( x - ' t 


a. x _ . 


( 6 ) 


where the available past data consist of N samples and the prediction is based 
on "m" samples (considering the "analog" method above). The corresponding 
(squared) error is then 
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e 0 2 [mN] - ( x i - ^ a j x i-j) 2 - 


The transmission is based on Equation 7 exceeding a threshold " t. M Hence, 
the statistics of Equation 7 must first be determined, noting that the optimal ( a k) 
that minimize Equation 6 will satisfy 



x x 


n ~ k 


0 

E L 


a. x . x . , 

j n - j n k ’ 


k = l, 


m. 


n=-N+l 


If Y n is the N -column vector (x +n , x +n - x . . • x n - N+ i), then the minimum 
of Equation 6 becomes 


i D + 1 

1 m+ 1 

¥ “5 ’ 

m 

where D m+ x is the determinant of the m + 1 by m + 1 matrix with entries 


Y. • Y j , i, j - 0, - 1, .... - m 
and D is the determinant of the m by m matrix with entries 

m 


-1, - 2, . . . , - m. 


On the other hand, the error is of interest (Equation 7) which is 


e Q 2 [m;N] 



( 8 ) 


where D' . , has the first row 

m + 1 


X l- x 0> x -l> • ■ • ’ X -m+l 

and is otherwise the same as D m+ v The statistics of Equation 8 are of primary 
interest. In this case it is necessary to know 
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Pr. 


e 0 2 [m;N] 2 t 


which is the probability of threshold excedence and relates directly to the attain- 
able compression. Actually Equation 8 determines only the "open-loop" error. 

To evaluate the complete system, the "closed-loop" operation must be considered, 
where actual samples are replaced by predicted samples when the former fall 
within the set threshold. Needless to say, the analysis tends to get involved 
even for the Gaussian model. 

It is equally important to know what the "quality" of the compressed data 
will be. The cross-correlation coefficient or other equivalent quantitative index 
may be used and evaluated as a function of t, m, and N. A complete system 
evaluation would also require that the channel noise be taken into account. Such 
an analytic evaluation is necessary to provide a logical basis for choosing opti- 
mum design parameters. Indeed, if the channel noise characteristics are known, 
then appropriate coding or signal selection has to be considered in arriving at 
the total compression possible. 

In all of these cases cited, the adaptive theory winds up with an "optimum" 
procedure. It is necessary to be able to compare such procedures on a quanti- 
tative basis. To evaluate how good the system actually is, however, it is neces- 
sary to specify the class of input or system parameters, or their statistics if 
they are regarded as randomly varying. This has many difficulties since, by its 
very adaptive nature, the system has to cope with conditions which cannot be 
preassigned. One way out, as indicated, is that of asymptotic optimality; how- 
ever, this is not entirely satisfactory either. 

In conclusion, it should be noted that adaptive methods in communication 
theory are still in their formative stages. Before any general theory can be 
evolved some specific problems, such as the ones mentioned, need to be explored 
in detail. The scope of the theory is quite far-reaching and the examples dis- 
cussed are intended primarily to state some specifics rather than vague gener- 
alities. 
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The optimum operations for the "pure" prediction (no noise) of a discrete 
time series [s.] with respect to a mean square error cost function are well 
known when the data probability distributions are given. Unfortunately, there 
are many practical situations (References 1,2, and 3) where it is not possible 
to specify these probability distributions and an adaptive technique is required 
in order to approach the minimum variance. There is no universally agreed 
upon adaptive procedure, but most systems rely upon the minimization of an 
appropriate quadratic form in the data close to the point to be predicted. In one 
such system (References 1 through 4), the predictor form is taken to be a linear 
weighting of the values immediately preceding the point to be predicted, the 
number of values (memory) being fixed a priori . The sample mean square error 
(fitting error) is minimized with respect to the weights in finding the adaptive 
predictor. In Reference 4 it is shown that the mean square prediction error for 
stationary Gaussian data with square integrable spectral density and mean 
zero is 


cr 2 [M,N] = O- 2 [M, 00 ] (l +^j + o(y) - 

where M is the memory and N (learning period) is the number of predictions 
used in calculating the sample mean square error. It is desirable to keep N 
small in order to minimize the effect of nonstationarities. On the other hand, it 
can be seen that the variance does not generally decrease monotonically with M 
for fixed values of N < 00 since cr 2 [M, °°] is a decreasing function of M which 

/ M\ 

approaches a constant and I 1 + I is an increasing function of M . Thus, there 

is an M < <» such that a 2 [M,N] is minimized (assuming a 2 [°°,°°] > 0). If some- 
thing is known about the behavior of a 2 [M,°°] , the optimum value of M can be 
estimated a priori. Usually this is not the case. 
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This paper is concerned with the estimation of a 2 [M,N] from the data for 
the same Gaussian data model. Unfortunately, the fitting error is a poor esti- 
mate since it is monotonic and decreases with M even though cr 2 [M,N] is not 
monotonic. However, an appropriately weighted function of the fitting error can 
be used. On the basis of this estimate it is then possible to specify the best 
value of M for fixed values of N and then to specify a value of N so that the 
factor cr 2 [ M, N] /a 2 [°°,co] is sufficiently small. The values of M/N considered 
are "intermediate" in the sense that terms of o(M/N) in asymptotic expansions 
can be neglected whereas the M/N term cannot. Specifically, it is shown that 
if F 2 (M,N) is the fitting error or sample prediction error variance then 


E [F 2 (M,N)] 


cr 2 [ M,®] 


L M\ , o(M) . o(M) 
V N / N 6 N 


Thus an asymptotically unbiased estimate of cr 2 [M,N] is 

i M 

+ N 

a 2 [M.N] = F 2 (M,N) — 


The minimization of this quantity as a function of M over all values of M up 
to some maximum value M max such that M max /N << 1 provides an estimate of 
the optimum memory M 0 . The quantity 


[lax’ 00 ] 


F 2 (K 


C ’N) 


1 - 


M 

m 

n~ 


gives an estimate of cr 2 [M max ,°°J. Now the learning period is adjusted to a value 
N 0 so that 


52 [Mo.nJ 

“ — " 7 £ 1 + e 

a 2 | M ,oo| 

L max J 


where e is an allowed deviation of the ratio of the mean square error to the 
minimum variance from unity. This procedure is motivated by the desire to 
bring the mean square error within e of the optimum while keeping N 0 as small 
as possible. No optimality is claimed for the procedure except possibly in the 
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limit as e o . However, to the author's knowledge there is no other available 
procedure to accomplish this. The nonlinear relations between the data and the 
predictor when the data probability distributions are unknown make an exact 
analysis extremely difficult if not impossible. 

The results of computer simulations are presented in a later section to 
verify the effectiveness of the procedure. 

DESCRIPTION OF PREDICTOR 

As noted in the introduction, linear predictions of the time ser? ;> [ • ] are 
sought in the form 


where and S £ are M column vectors 

s i = { s i-j ’ J = i. • • • -M}, 

and B mn is a weight vector determined adaptively for each value of the predictor 
memory M from sample predictions of the N points preceding the point to be 
predicted (taken for convenience to be s 0 , the value at the time origin). The 
sample error variance is then minimized to determine B^. The resulting 
minimum is the fitting error 


F 2 (M,N) = min ^ (s_. Si, B ^) 2 

b mn n 4? 


(i) 


This quadratic form has a minimum for 


1 N 
i= i 


For convenience, this is expressed in the matrix form 


g MN f MN B mn 


where g MN is a column vector and r MN is a matrix given by 


MN 
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and 


g MN 



N 


I> s -* 


i = l 


r MN 



N 

L 

i = 1 


s. . s: . 


Thus (assuming r MN is nonsingular), 


®MN 


r MN g MN 


( 2 ) 


(3) 


Now, given the fitting error F 2 (M,N), an adaptive procedure must be found by 
which M and N can be determined. This cannot be done from F 2 (M,N) directly, 
because, although it converges to cr 2 (M, 0C ) as N - °°, for fixed values of N it de- 
creases monotonically with M (as the number of degrees of freedom increase) 
even though the mean square prediction error does not (Reference 4). The de- 
termination of the adaptive procedure must be postponed until the mean fitting 
error E [F 2 (M,N) ] is evaluated. 

MEAN FITTING ERROR 

The analysis of the mean fitting error proceeds as in Reference 4. The co- 
efficient vector B mn is a highly nonlinear function of the data as shown by Equa- 
tion 3. However, asymptotically* (References 4 and 5), if the process spectral 
density is square integrable, 


®MN 


r MN g MN 


+ 1 ( ®MN r MN®M ) + 6 


MN 


(4) 


*To make this expression rigorously correct it is required to use a constraint in the form ||B mn || 
^_CN P , C > 0, P > 0 to take care of singularities in r MN . This is readily done by modifying the 
definition of Equation 3. Since Equation 1 is usually minimized by a gradient technique rather 
than by inverting r MN , no practical difficulties are encountered in this definition. 
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where 


Rm E [ f mn] - M by M covariance matrix of s 4 

~ ® [®MN ] ’ 


and 


E [l 1 € mn N 2 ] = o (fj . 

From the square integrability of the process spectral density it follows that R,^ 
is nonsingular for M < 00 (Reference 4). Hence, = R^, - 1 = JLim r M “ 1 g„ N 

N-°° 

is the optimum set of coefficients. The second term in the expansion of Equa- 
tion 4 represents the effects of "small" statistical fluctuations and e MN is the 
remaining error whose contribution goes to zero in the mean square sense faster 
than that of the second term as M/N - 0. The mean square prediction is by 
definition 

= l[. 0 -^' N S 0 ] a . 

The establishment of its asymptotic value is outlined as a prelude to finding the 
mean fitting error. Details can be found in Reference 4. 

This expectation can be expanded in terms of the minimum variance coeffi- 
cients B^ as 


cr [M,N] E [ s 0 SqB^] 2 + E ^2 [s Q S 0 B mco ] [s o (B^ B mn )] + [S Q ' (B^- B^)] 2 j’. 

As shown in Reference 4, by use of the asymptotic expansion of Equation 4, this 
reduces to 

ct 2 Im,N] = cr 2 [ M, 00 ] (l +”) + (5) 
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The same method can be used to evaluate the expected fitting error. From 
Equation 1, by using the solution of Equation 3 for the coefficients the expected 
value of the fitting error is 


E[F 2 (M,N)] 


N 


L_ i= i 


= E 


N 

N" y S "i ( S ~ i~ i ®Mn) 


L- i = l 


By expanding about the optimum coefficients and using Equation 4, 

N 

E [ F 2 (M,N) ] = 

i= l 

= a 2 [M,«] -E 

O 2 [ M , x ' ] “ E r MN l (gMN** r MN^M * )] 

= [M,®] - E [(V 1 ^ + V‘ + £ »n)' 

(®MN r MN^M S) 

a 2 [ M, 00 ] - E [(g MN ~r MN EM 1 G M ) Fjj 1 (g MN _1 jnN®M Si)] + ' 


{e [s, (s..-S_ , i ^ |0B )] - E [s_ i Sjj (B MN -B Moo )]j 


The expected value of the quadratic form in Equation 6 remains to be evalu- 
ated. Since {s.} is stationary, E[s.s. + .] = E[s 0 s.] . Therefore 
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^ [( g MN r MN^M ( g MN r MN 1 )] 

N N 

J 2-1 [( S ~ i "' S " iBMC0 ) 1 S “k ( S -k S -k^co)] 

i=l k=l 

N 


( S 0 ( S i ^Mco) 

i=“N 

From the square integrability of the process spectral density it follows that 


N ^ [( s 0 ^0 ®M®) ^O 1 ( s i ®M®)J 

i = -N 




CO 

= H E [( s o _s o'v) %'V‘Si ( s r s .X-)] + °(tt) • m 

i =-oo 


Since |s. J is a Gaussian time sequence and E [Sj (Sj-S/B^)] = 0, the series 
reduces to 


{ E [(^A.) Ors/M E [SoV'Si] 

i = -oo 


+ E [(V^'W S,'] V‘ e [So (“i-s; V)]} 


(8) 


Note that if 


MM 



(n)e inA 


is the cross spectral density of arbitrary random processes, then 
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f 1 (\)f 2 (\)d\ 


CO 



n - -co 


Rl(n) R 2 (n) 



Therefore Expression 8 can be put in the spectral form 


i r 2n 

~ |l-e;(\)B M 00 | 2 p(\)e M *(^)R M " 1 e M (Mp(Md^ 
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where p(^) is the process spectral density and e M (\) is the M-column vector 

e M (M = {e imA , m=l, • • • , M}. 

Here ( )' denotes transpose and ( )* denotes complex conjugate transpose. Now 
Ry -1 can be expressed as the product of a triangular matrix with its transpose 


Km' 1 = - 

where A m S i is a random variable with uncorrelated components. Therefore, 
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Expression 8 becomes 
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But it is well known (e.g., Reference 6) that 


1 - e k (\) B kco | 2 p (k) 


converges in measure to o- 2 [k, ®]. Hence, Expression 9 becomes 


M 




[a 2 [M,®] + e 2i < M+1 > A 


* - 

L 1 "" e M ( J 


d \ + o (M) 


- Mo -2 [M,oo] + o(M). (10) 


Substituting Equation 10 into Equation 7 and Equation 7 into Equation 6 results 
in 

E[f 2 (M, N)] = <7 2 [M, ®] (l - -£) ♦»(-£) . (11) 

MEAN SQUARE PREDICTION ERROR ESTIMATION 

A comparison of Equations 5 and 11 immediately suggests the estimator, 

, „ r „ 1 + (M/N) 

a 2 [M, N] = F 2 (M, N) ~ • (12) 

The estimate of Equation 12 has the property that 
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E a 2 [M, N] 


a 2 [M, N] + o(4) 


o (M) 
N 


(13) 


+ 


The value of M for which Equation 12 is a minimum, denoted M 0 , is an esti- 
mate of the value of M to be used in predicting s D to minimize the error vari- 
ance for the learning period N. The minimization is over the values of M up to 
some chosen maximum value M max . The value of N is chosen so that M max /N << 1 
(typically 0. 2-0.4). Then an estimate of the minimum variance cr 2 [M max ,c°] , is 
taken as 


a 


= r ; (M„,,,N) • i z (M /N ) 


Then N is adjusted to a value N o such that 


a ■ 


[m,n] 

M , oo 

max’ 


^ 1 + e , 


where e is an allowable deviation of the mean square prediction error from the 
optimum (typically about 0.2). 

The analysis of the error term effect in Equation 13 is difficult. However, 
since it is the large changes in 0-2 [M,N] that are most important, for "suffi- 
ciently small" values of M max /N the higher order effects are negligible. Unlike 
the fitting error itself, an estimate of < 7 2 [M,N] has been produced which has a 
minimum. Another difficulty lies in determining the variance of the prediction 
error when this procedure is used. It should be noted that the calculation of this 
variance involves the determination of the probability as a function of N and M 
so that o- 2 [M,N] is a minimum. Since this is extremely complicated, computer 
simulations have been resorted to in an effort to verify the effectiveness of the 
procedure. Some results are given in the next section. 

RESULTS 

The prediction procedure described above has been extensively verified by 
computer simulation. Independent Gaussian random variables are generated 
and operated on by an appropriate digital filter to obtain data that has a given 
spectral density. The prediction procedure is repeated 10,000 times independ- 
ently to give a 90 percent confidence interval with ± 2.33 percent of the true 
value of a 2 [M 0 , N 0 ]. In general, the method has produced good results. The fol- 
lowing typical results are presented for one spectral density. The spectral 
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density is chosen, as in Reference 4, to be bandlimited with a small sample-to- 
sample independent "noise" component 


p(\) = 4.996 

IM 

<L 0. 2rr 

= 0.001 

W 

> 0. 277 . 


The memory and learning period are constrained by 


M 


max 


8 


N > 20. 


The values of M 0 and N satisfy 


K.xH 


i 1.2; 


N is changed by 10 at a time to satisfy this inequality starting with N = 20. The 
results obtained were 

Prediction error variance over 10,000 samples = a 2 - 0.0120, 

Average learning period = N o = 31.0, and 
Average memory = M o = 5.95. 

The actual minimum variance for this spectral density is found to be 


- 2 [8, oo] = 0.00802. 


Thus 


cr 


[ 8 , 


= 1.50, 


which is reasonably close to the desired maximum value of 1.2. 
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In Reference 4 the mean square prediction error for these data was a mini- 
mum for a memory of 4 with a fixed learning period of 30, the minimum being 

0. 0117. on the basis of computer simulations. Since this is close to a a 2, the 
procedure comes very close to the minimum for a learning period of 30. There- 
fore, it can be concluded that the procedure is reasonably effective for these 
data. Further computer simulations have borne out this conclusion more gener- 
ally. For smaller values of e and M max /N, the desired value of 1 + e is naturally 
approached more closely. 

SOME ADDITIONAL COMMENTS 

It is recognized that, in the use of the foregoing procedure, it will not be de- 
tected if a2[M,co] decreases significantly for M > M max . However, since the 
procedure must be terminated somewhere, this risk will always exist. 

Although in the absence of other information it seems most reasonable to 
weight those points closest to the value to be predicted, it is worth mentioning 
that the above analysis can be carried through for any linear predictor with M 
degrees of freedom. Thus an even more complicated procedure could be based 
on the minimum of the fitting error over all linear predictors with M degrees of 
freedom for various values of M. 

An unknown factor is the effect of departures from the Gaussian assumptions 
as well as the effect of nonstationarities. Each possible factor has to be con- 
sidered on its own merit. 
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3. RESULTS OF ADAPTIVE PREDICTOR STUDIES, PART 1 

R. L. Kutz 

Goddard Space Flight Center 

Greenbe It, Maryland . 

N67-27405 

The data used for this study are taken from ten Tiros cloud pictures. The 
Tiros video data are normally stored on analog tape at the ground station. To 
be useful for digital processing, the analog data are sampled, quantized to six 
bits, and stored on digital magnetic tape with one sample per tape character. 

A digital computer is used to remove the line synchronization pulses that are 
necessary only for the analog picture display equipment. The fifth picture of 
the series is used in the following discussion as a typical example of the ten- 
picture group. An example of an analog picture produced by the Tiros kinescope 
ground station display equipment is shown in Figure 3-1 and the digital version 
of the same picture after the first computer processing is shown in Figure 3-2. 
i All digital pictures are displayed using the Stromberg Carlson 4020 high-speed 

microfilm recorder. In order to make an evaluation of the processing effect on 
the picture, it is necessary to redisplay cloud pictures which have been subjected 
to data compression. 

The first data compression technique to be discussed is that of linear pre- 
diction (Reference 1). In this type of prediction a weighting of past data samples 
i is used to produce an estimate of the sample to be predicted. The first point of 

interest is to determine the memory M , i.e., how many past samples need to be 
weighted to get the best estimate of the following data sample in the sequence. 

This was done by using L (the learning period) data samples to produce a set of 
weights which are used to predict the following L/2 samples with an error 
threshold Ti = 0. Figure 3-3 indicates that M = 3 produces the minimum rms 
prediction error for L = 20. Figure 3-4 shows the effect on the rms learning 
and prediction errors of changing L with M fixed at 3. 

It is undesirable to use zero error data samples to produce a set of weights 
j ■. because of the implication that error-free data samples are available at the re- 

ceiver. Error-free data samples arrive at the receiver in two ways — either 
they are predicted with no error or they are transmitted through the error-free 
1 - channel. For the data used here, it was found that L of the learning samples 

frequently had to be transmitted. The savings to be achieved through data com- 
pression are negligible if many sets of weights are required, as is frequently 
the case. 
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Figure 3-1. Analog Picture Produced by the TIROS Kinescope 
Ground Station Display Equipment 


As explained in Reference 1, half of the learning period was made up of 
transmitted elements a priori and the other half was made up of processed data 
having an error of less than Ti- A rule requiring no special coding is used to 
determine when a new set of weights is to be generated. Since the weights are 
to be identical at the transmitter and the receiver, they must be derived from an 
identical learning set of points. This rule involves the testing of a second 
threshold T 2 against the mean square prediction error on transmitted elements 
every time L/2 samples have been processed. Threshold T 2 is chosen by com- 
pressing the same data for several values of T 2 , with the other parameters fixed, 
and picking the value of T 2 which maximizes the compression. A value of T 2 
which is too small will constantly demand new weights and, therefore, much of 
the data will be transmitted without any attempt at prediction. On the other hand, 
a value of T 2 which is too large will cause few sets of weights to be produced, 
hence a set of weights may be often used over portions of the data where it 
performs poorly. 
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Figure 3-1. Analog Picture Produced by the TIROS Kinescope 
Ground Station Display Equipment 

As explained in Reference 1, half of the learning period was made up of 
transmitted elements a priori and the other half was made up of processed data 
having an error of less than Tj. A rule requiring no special coding is used to 
determine when a new set of weights is to be generated. Since the weights are 
to be identical at the transmitter and the receiver, they must be derived from an 
identical learning set of points. This rule involves the testing of a second 
threshold T 2 against the mean square prediction error on transmitted elements 
every time L/2 samples have been processed. Threshold T 2 is chosen by com- 
pressing the same data for several values of T 2 , with the other parameters fixed, 
and picking the value of T 2 which maximizes the compression. A value of T 2 
which is too small will constantly demand new weights and, therefore, much of 
the data will be transmitted without any attempt at prediction. On the other hand, 
a value of T 2 which is too large will cause few sets of weights to be produced, 
hence a set of weights may be often used over portions of the data where it 
performs poorly. 
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RMS ERROR IN QUANTUM STEPS 



Figure 3-3. Prediction and Learning Errors Versus Memory 


term, 10 weights are required for D = 2. The results are shown in Table 3-1 
for comparative values of M and L. Here the learning period is too short for 
10 weights, which would account for the poor results. 

The method used to find the weights which would minimize the mean square 
fitting error over the learning period involves the solution of simultaneous 
equations. Both the "steepest descent" (SD) and "conjugate gradient" (CG) 
methods of solution for simultaneous equations have been tried. The CG method 
converges in fewer iterations than does the SD method, but the SD method has 
fewer matrix operations per iteration than does the CG method and the fitting 
error is produced as part of each cycle, which is of interest in controlling the 
quality of fit. 
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* Zero-order hold. 

**01d method, learn % on predicted elements and ]/ 2 on unprocessed (0 error) data. 




LEARN 


Figure 3-4. Prediction and Learning Error Versus Learn 


In general, the compression ratio is quite sensitive to the quality of the fit 
over the learning period. The closest fit over the learning period does not 
always produce the highest compression ratio, depending on the size of M and L. 
However, for the values of M and L considered here and for D = 1, a minimum 
mean square fit gives the best results. In general, the constant term (Refer- 
ence 1) should not be important for a minimal fit solution. This is not found to 
be the case when the SD solution is used as shown in Figure 3-5. The constant 
affects the magnitude of the terms and hence emphasizes the truncation error. 
An IBM 7094 computer is used for this simulation with eight decimal digits of 
accuracy. 

The linear predictor with M = 0 and L = 1, which is the zero-order hold, is 
used as a standard for the comparison of various data compression techniques. 
Table 3-1 shows a comparison between two classes of predictors, linear and 
Markov, with five variations of the linear predictor including the zero-order 
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Figure 3-5. 


Element Compression and E (Fitting Error) Versus Constant Term (K) 


hold. There are three compression ratios shown in Table 3-1. The first is the 
element compression ratio defined by 


Element compression 


(Number of TV elements in original digital picture) 

(Number of TV elements not predicted, i.e., the 
number of transmitted elements) 


and (q = l/Element Compression) is the estimate of the probability of transmis- 
sion, as if such a probability existed. The second and third compression ratios 
give the bandwidth reduction, the equivalent transmission time, or the power re- 
duction achievable in a noiseless channel. The second compression ratio is 
produced using run-length coding. Here, run-length coding (Method 2) uses an 
indicator bit with each transmitted data word to indicate whether the remaining 
word bits are a TV intensity element or a portion of a run-length word. The 
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third compression ratio is calculated from the entropy equation using q as the 
probability of transmission and the number of bits used to represent each TV 
element's intensity N = 5 where 


N 

Bit compression (entropy code) = qN - q log 2 q - ( 1 - q) log 2 (1 - q)T 


The usual assumption is made in the above equation, i.e., that all intensities 
are equally likely after prediction, which is the worst case. In Reference 1, 
another type of run-length coding is discussed and is called Method 1 coding. 

Using Method 1 coding, K bits are added to each transmitted TV element to allow 
for run lengths up to 2 K elements. However, Method 1 coding has the disadvan- 
tage that a data expansion from N bits per element to (N + K ) bits per element 
results if no compression is possible. The second compression figure in 
Table 3-1, labeled bit compression (method 2), also results in a slight expansion 
where no compression is possible; however, it is small i.e., from N bits per 
sample to (N + 1) bits per sample. 

Table 3-1 shows that of the linear predictors the zero- order hold performs 
best. The results shown for the Markov predictor, which are discussed in detail 
in part II of this study*, are more favorable than those of the linear predictors. 

A Markov predictor with a memory of 1 is shown in Table 3-1 because no cases 
of memories greater than 2 have been programmed for five-bit data because of 
computer storage size and because a memory of 1 is easier to discuss and 
graph. Learning over 16 lines, as with the Markov predictor, implies a slower 
adaption of the predictor than with the linear predictor, which also has a memory 
of 1 but learns over 10 to 14 elements. Table 3-1 shows the linear predictor with 
a memory of 1 for D = 1 and D = 2. As before, the D = 2 case is not so good 
as the D = 1 case. It should be noted that the linear predictor with a memory 
of 1 performs nearly as well as the linear predictor with a memory of 3. 

Figure 3-6 shows a comparison between the linear and Markov predictors 
for a memory of 1 and a learning period which uses all of the picture elements. 
To get a prediction using Figure 3-6, the previous element value is placed on the 
abscissa and, using the desired curve, the predicted value is read on the ordinate 
There is a noticeable deviation between the two methods for those intensities 
greater than 25 as well as for those intensities less than 10. This difference in 
the prediction curves of Figure 3-6 indicates a difficulty in the linear predictor 
which could account for its poor performance. That is, one may have to use a 


*Paper 6 of this conference. Prepared by J. A. Sciulli, Goddard Space Flight Center. 
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Figure 3-6. TV Element to be Predicted Versus TV Element on 
Which the Prediction is Based. 


prediction curve that is not optimal, in the mean square error sense, for some 
of the data intensities. A method of combining the rapid learning of the linear 
predictor with the optimality of the Markov predictor at each intensity amplitude 
is given in Reference 2 as Method 3. 

It should be mentioned that rather than using the conditional expectation 
(i.e., conditional mean as reported herein) to predict one could use the condi- 
tional mode. The conditional mode has been used in Reference 3 by L. D. Davis- 
son, with somewhat better results than those obtained by use of the conditional 
mean. 

Figures 3-7 through 3-10 show the quality of TV pictures which have been 
compressed, transmitted through an error-free communication channel, and dis- 
played after expansion. The compression technique used is the linear prediction 
with M = 3, L = 20, and D = 1 for = 2.5 (Figure 3-7), Ti = 3.5 (Figure 3-8), 
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Figure 3-2. Digital Version of Figure 3-1 After the 
First Computer Processing 


In order to simplify the selection of T 2 and avoid the transmission of too 
many data samples, a different compromise is made. The learning set is made 
up entirely of processed data which may have sample errors as large as T x . 

As might be expected, a small loss in compression is encountered for fixed 
values of Tj and T 2 which correspond to the optimum value for the previous 
method of learning. However, now the only cost due to making T 2 smaller is 
the increased processing time for the computer. As shown in Table 3-1, a 
small value of T 2 results in an improvement in compression over the previous 
method. 

Another possible way to increase the compression is by using nonlinear 
prediction. Here a parameter D is used to indicate that all samples of the 
memory are combined in such a way that each uniquely indexed product of 
samples, i.e., D,D -1, etc. down to the individual samples themselves, are 
weighted to get each estimate. For example, with a memory of 3 and a constant 
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Figure 3-7. TV Picture After Compression, Transmission, and 
Expansion where Tj = 2.5 

T! = 5.5 (Figure 3-9), and Tj = 7.5 (Figure 3-10). Those pictures with T ; > 3.5 
would be unuseable. 

CONCLUSION 

When using linear prediction, higher compression ratios can be achieved by 
learning from data for which prediction has been attempted. A memory of 3 
gives slightly better results than a memory of 1 using linear prediction, but the 
difference is negligible. The zero-order hold is the best of the linear predictors. 
If nonlinear terms are introduced, as with D = 2, the compression ratios are 
lower than with D = 1 using linear prediction. The two methods mentioned for 
solving the minimum mean square error weights are "Conjugate Gradients" and 
"Steepest Descents," of these, the latter is used most. Table 3-1 shows that a 
more efficient code could gain only 1 db at most for the data being considered. 
Table 3-1 is also used to make a comparison between the linear and Markov 
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Figure 3-8. TV Picture After Compression, Transmission, and 
Expansion where T-j = 3.5 

predictors. The Markov predictor performs better than the linear predictors 
probably because of its optimality for all data amplitudes. It is suggested that 
Balakrishnan's Method 3 be used to circumvent the problems encountered with 
the other two methods. 
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Figure 3-9. TV Picture After Compression, Transmission, and 
Expansion where Ti = 5.5 
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4. DEMONSTRATION OF A QUANTILE SYSTEM 
FOR COMPRESSION OF DATA FROM DEEP SPACE PROBES 


T. O. Anderson, I. Eisenberger, W. A. Lushbaugh, and E. C. Posner 

Jet Propulsion Laboratory , CIT 
Pasadena , California 


THEORETICAL BACKGROUND OF THE QUANTILER 

In circumstances such that the locality of experiments ^oe's'^oFcoi^^teO 6 
with the locality of the statistical computations performed on the experimental 
data, the sample data must be transmitted through a communications channel 
from the point of origin to the processing center. If many experiments are 
being performed simultaneously and the same communication channel is used to 
transmit the sample values of each experiment, there is a limit to the total 
number of observations that can be sent in a given time. This limits the number 
of experiments that can be performed for given sample sizes. One way of ef- 
fecting "data compression" is by transmitting a small number of sample quan- 
tiles (instead of all of the sample values) and extracting the desired statistical 
information. The following is a brief discussion of the use of sample quantiles 
for this purpose. 


Let x !, x 2 > . ■ . , x n be n independent sample values taken from a population 
with a density function g(x) and a distribution function G(x). If g(x)is con- 
tinuous, then r is said to be the (population) quantile of order p, or the pth 
quantile of the distribution, if £ p is the (unique) solution of the equation G(£)= p. 
Equivalently, 


P 



g(x) dx, 


(0 < p < 1). 


Similarly, a sample quantile of order p, Z p , can be defined by arranging the ob- 
servations in ascending order of magnitude, 


x ( 1 ) = X ( 2 ) => 


< X 


( n ) * 


Then 
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*[ np] 


z = 

p 

where [np] denotes the greatest integer^ np. This means that 100 percent of 
the sample values are smaller than Z p . For example, Z\/ 2 is well-known as 
the median of the sample distribution. 

Now the sample quantiles, being random quantities, have probability distri- 
butions of their own. Moreover, subject to the condition that the (original) 
parent density function possesses a continuous derivative in some neighborhood 
of each quantile value being considered, the joint distribution of any (fixed) num- 
ber of quantiles has the useful property of approaching the multivariate normal 
distribution as the sample size n - oo (Reference 1). The mean and variance of 
the limiting distribution of Z p are given in that reference by 


E(z p ) = r Jp 

and 


Var ( Z p ) 


PC 1 ~ P) 

ng 2 a p ) 


The correlation p between Z Pl and Z P2 , for Pj < p 2 is given by 

r , -] % 

Pi ( 1 - p 2 ) 

= P 2 (l - Pl ) • 


Thus, for a sufficiently large sample size, if the limiting normal distribution of 
the sample quantiles is assumed when statistical analyses are based on sample 
quantile values, the error involved in making the normality assumption will be 
small. 

Estimates of the parameters of a normal population have been obtained by using 
quantiles. A measure of the reliability of such an estimate is its variance. 

The relative reliability, called the estimate efficiency, is then defined to be the 
ratio of the variance of the best estimate using the entire sample to that of the 
estimate using quantiles. 
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Another type of statistical information obtained by using quantiles is con- 
cerned with tests of simple hypotheses. Here a measure of reliability is the 
power of the test, that is, the probability of rejecting the assumed, or null, hypo- 
thesis when it is false. The relative reliability, also called the efficiency of the 
test, is defined in this context to be the ratio of the power of the test using quan- 
tiles to that of the best test using all of the sample values. The task of deter- 
mining the efficiencies of the estimators and tests is simplified considerably by 
the assumption of the normality of the quantiles. 

Data compression is obtained in the following manner. The use of a small 
number of quantiles to obtain statistical information means that only a few ob- 
servations need to be transmitted instead of all of the sample values. The result 
is a high "data compression ratio." This compression ratio would be of little 
value, however, if the relative reliability of the information acquired in this 
manner were proportional to the ratio of the number of quantiles used to the 
total number of observations. The relative reliability, as shown, is not propor- 
tional to this ratio, but depends only upon the number of quantiles used and how 
they are chosen. This accounts for the advocacy of the use of quantiles for data 
compression. 

The primary function then of a "quantiler" is to choose a number of quan- 
tiles of specified orders from a set of samples and transmit these values to 
Earth, where the desired information can then be extracted. Since, for a fixed 
number of quantiles, the relative reliability of a given type of statistical infor- 
mation is critically dependent upon the choice of the quantiles used, it is natural 
that one should specify the orders of the quantiles which maximize the relative 
reliability. However, there are certain restrictions in making this optimal 
choice. 

First of all, the orders of the quantiles might have to be specified in ad- 
vance and remain fixed throughout the flight. Second, the orders of the quantiles 
which maximize the relative reliability of one type of information are not neces- 
sarily those which maximize the relative reliability of another type. Hence, in 
order to use the same set of quantiles for all types of statistical information 
that may be desired, a compromise must be made in the choice of their orders. 

A brief theoretical discussion will therefore be given concerning the optimal 
choice of quantiles for estimating the mean /x and standard deviation a of a 
normal parent population. Details are found in Reference 2. 

If it is assumed that both the mean /x and standard deviation a are unknown, 
an estimator of /x using k > 2 quantiles of distinct arbitrary orders of the form 
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CiZi 


is unbiased (has expected value /x) if the C are determined such that 


and 



= 1 



= 0, 


where each Zi is of order pi and ^ * is the population quantile of order pi of 
the standard normal distribution. This occurs, because when the expected value 
of P is used one has 


E(,u) - 


k 



i = 1 


k 

^~ l C i O + cr^i*) 

i= 1 


fi + cr 





It is obvious that for k = 2, unless TL y and Z 2 are symmetric quantiles 
(that is, unless p 2 = 1 - pi, implying that + £ 2 * = 0), p will not be un- 
biased for any choice of Ci and C 2 . Moreover, for any even k, it has been 
shown that, in order for p to have minimum variance, pairs of symmetric quan- 
tiles should be used (Reference 3). Under this restriction, it is easily seen that 
the estimate of y. in the form 


k/2 

a = 2>( Zi ♦ wi). Pi + pk-i+i = 1 

i = i 

will be unbiased if a t > 0, i = 1, 2, . . ., k /2, and 


k/2 



1 / 2 . 
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By the method of maximum likelihood, the coefficients ai can be deter- 
mined to provide an unbiased estimate of /x with minimum variance for k/2 
arbitrarily chosen pairs of symmetric quantiles. Finally, by varying the in 
the expression for the variance of jl, the orders of the particular set of k/2 
pairs of symmetric quantiles, as well as the coefficients for which Var(/x) is 
a minimum, can be determined. 

By using the same technique, unbiased estimators of a from k/2 pairs of 
symmetric quantiles can be obtained; these estimators are of the form 


a 


k/2 

L 

i = 1 


Pi 

-i+1 


( z k-i + l - z i) 


where /S i > 0 and 


k/2 

Z/l ■ 1/2. 


i= 1 


For this case, 


k/2 p k/2 

e O) = + o'S'k-i+i * M - <rCi) = 2a Y] Pi 

i=l ^k-i + l 


since r * = - /* 

° i ^ k - i + 1 ’ 

By again varying the pi 's in the expression for the variance of the maximum- 
likelihood estimator of a, the values of the p, and that minimize Var(/) can 
be determined. Only symmetric quantiles (p t + p k _ i+1 = 1) have been used to 
estimate a (it has not yet been proved, but only conjectured, that this is the 
optimum procedure to adopt). It can be said at present about the a obtained in 
this manner, that this & is the maximum-likelihood minimum-variance unbiased 
estimator of cr using k/2 pairs of symmetric quantiles. 


Further mathematical details are given in Reference 2, where estimators 
of fj. for k = 1, 2, 3, 4, 5, . .., 20 and estimators of o- for k = 2, 4, 6, . . ., 20 
are also given, having been derived by the method described above. The effi- 
ciency of jl, as defined previously, is shown in Reference 2 to increase from 
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0.637 for k = 1 to 0.994 for k = 20. The efficiency of a was shown to increase 
from 0.612 for k = 2 to 0.984 for k = 20. 

Since p, is assumed to be unknown, an estimator for a cannot be obtained 
using one quantile. The median is the only possible estimator of /x using one 
quantile if a is unknown, since £* ^ 0 for values of p other than p = 0.5. In 
fact, the median is the minimum-variance estimator of fj., even if & is known. 

If ju. is known, however, an estimator of a can be obtained using one quantile. 
Both Z (0.9424), the quantile of order 0.9424, and Z (0.0576), the quantile of order 
0.0576, provide unbiased minimum-variance estimators, which are given by 


ffj = 0.635 [Z( 0.9424 )]- yu. 


and 


ct 2 = - 0.635 [Z( 0.0576)] - fi . 

The efficiency of both estimators is 0.304. 

The attempt to achieve a high data compression ratio by the use of quantiles 
is rewarded by the fact that the best estimators of /x and cr, using as few as 
four quantiles, attain reasonably high efficiencies. With four quantiles, Ef f (/x) 

= 0.920, Ef f (<j) = 0.824, the estimators are given by 




= 0. 192[Z(0. 1068) + Z (0 . 8932)] + 0. 308 [Z(0. 3512) + Z(0.6488)] 


and 


0. 116[Z(0.9770) - Z ( 0 . 0 230 )] + 0. 236 [Z(0. 8729) - Z(0.1271)] , 


as shown in Reference 2. 

It can be seen, however, that the orders of the optimum quantiles for esti- 
mating fj. are quite different from those for estimating a, and similar disparities 
are apparent for all values of k . It is true that, for small deviations from the 
optimum values, the loss in efficiency of either estimator will be insignificant; 
this fortunate occurrence is attributed to the flatness of the surface representing 
Var(/x) and Var(a) about the optimum quantile values. Reference 2 discusses 
this flatness in more detail. 
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A more or less random choice of quantiles, however, can result in a seri- 
ous reduction in efficiency. For example, if the optimum four quantiles for 
estimating <? are used to estimate y , , the efficiency of jl is reduced to 0.732, a 
loss in efficiency of 25.7 percent. And, if the optimum four quantiles for esti- 
mating y are used to estimate a, the efficiency of a is reduced to 0.665, a loss 
of 19.2 percent. The problem, then, becomes one of establishing a criterion of 
optimality on the basis of which suboptimum sets of k quantiles could be deter- 
mined that would provide unbiased estimators of both y and a, using the same 
quantiles. 

The decision was made to use as this criterion the linear combination 
VarO) + b Var (a), b = 1, 2, 3, ... . 

The set of k/2 pairs of symmetric quantiles which minimized this linear com- 
bination was then defined as being suboptimum. For b = 1, 2, 3 and for k = 2, 

4, 6, • • •, 20, the suboptimum sets of quantiles, in the above sense, were deter- 
mined, the estimators constructed, and the efficiencies computed. These results 
are given in Reference 2. 

The results for k = 4 are as follows: 

For b = 1, 

y = 0. 141 [Z(0. 0668) + Z (0. 9332)] + 0. 359 [Z(0 . 29 12) + Z(0.7088)], 
cr = 0. 258[Z(0.9332) - Z(0.0668)] + 0. 205 [z (0. 7088) - Z(0.2912)], 

Eff(/t) = 0.908, and Eff(<?) = 0.735. 


for b = 2, 

A = 0. 106 [Z( 0.0434) + Z (0. 9566)] + 0. 394 [z (0 . 238 1) + Z(0.7619)], 
o- = 0, 196 [Z(0. 9566) - Z(0.0434)] + 0. 232 [Z(0. 7619) - Z(0. 2381)], 
Eff(/t) = 0.876, and Ef f (<?) = 0.779. 
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For b = 3, 

P = 0.097 [Z(0. 0389) + Z(0. 9611)] + 0. 403 [Z(0. 2160) + Z(0. 7840)] , 
a - 0. 179[Z(0.96U) - Z(0.0389)] + 0. 235 [Z(0. 7890) - Z(0. 2160)] , 

Ef f (p) = 0.857, and Eff(a) = 0.792. 

It is readily seen that as b increases Eff ( p ) decreases and Eff (a-) in- 
creases. Since larger values of b mean that greater weight is being given to 
Var(a), this is not a surprising result. The point is, however, that the value of 
b should logically be determined by the two estimators. If a is the important 
consideration, increasing the value of b gives different suboptimum sets of 
quantiles for which the efficiency of a is improved. If p is of paramount im- 
portance, b should be small. It should be noted, perhaps, that five choices of 
b are presently available (including b = 0 and, in effect, b = 00 ). 

One may in some cases prefer to suffer a loss in efficiency rather than use 
optimum minimum-variance quantiles. This occurs because, when k increases, 
the two optimum extreme quantile values, those of order pi and p k = 1 - pi, 
move farther out on the tails of the distribution. Although from a theoretical 
viewpoint this fact is of little consequence, from practical considerations there 
are two major objections to this behavior of pj and Pk . First, since n is never 
infinite, the true distributions of the sample quantiles are only approximately 
normal and, more important, the deviation from normality becomes more pro- 
nounced the farther the quantiles move out on the tails of the distribution. 
Second, the "normal" distributions that one encounters in practical situations 
are very often only approximately normal, with deviations from normality 
greater out on the tails than toward the center of the distribution. Thus, it is 
important on both counts to investigate the effect on the efficiency of the esti- 
mators when optimum and suboptimum estimates of fj. and a are obtained and 
when pi is restricted to be not less than some specified value. If the loss in 
efficiency is not excessive, it may well prove advantageous to adopt the cautious 
policy of restricting the value of p lf and thus avoid or limit a bias in the esti- 
mates of and a. This loss can be due either to a sufficiently large deviation 
from the assumed normality of the extreme quantile distribution or to the er- 
ratic behavior out on the tail of an approximately normal parent distribution. 

Optimum and suboptimum estimators of /x and u were constructed when pi 
was restricted to be not less than 0.01 and again for pi not less than 0.025. The 
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results were quite satisfactory in that losses in efficiency were small. Although 
the restrictions on p x affect the efficiencies of a to a greater extent than those 
of fl, efficiencies greater than 0.90 can be achieved for suboptimum estimators 
of cr for k > 10 when p x is restricted to not less than 0.025. In fact, for this 
case very little is gained by using more than 10 quantiles. 

A type of statistical analysis closely related to the estimation of parameters 
is the testing of hypotheses concerning the values of the parameters. For ex- 
ample, one may wish to test the simple hypothesis that a normal population with 
a known mean has a variance of cr 2 against the alternative hypothesis that the 
variance is cr 2 . A number of tests of this nature have been devised using up to 
four quantiles. As in the case of estimators, one naturally seeks to maximize 
the efficiency of the test (relative reliability) as defined above. As one might 
expect, it turns out that for the tests which are concerned with the mean value, 
the optimum quantiles are those which maximize Eff (£), and for the tests which 
are concerned with the variance value, the optimum quantiles are those which 
maximize Eff(o-). Consequently, not only have optimum test statistics been de- 
vised for these tests, but suboptimum test statistics have also been devised 
using the suboptimum quantiles for estimating /j. and a . The mathematical and 
statistical details are given in References 4 and 5. Also included in these re- 
ports are quantile estimators based on the degree of correlation between two 
normal populations. 

Thus, it can be seen from the above discussion that, by using quantiles to 
obtain certain types of statistical information, not only can a significant amount 
of data compression be achieved, but, equally important, the reduction in uncer- 
tainty which accompanies a large sample size is also retained. The investigation 
into further statistical uses of quantiles is being continued at the Jet Propulsion 
Laboratory. We shall describe, hereafter, a system which allows the quantiles 
to be determined onboard a spacecraft. The simplicity of this system, coupled 
with the large data compression ratio achievable, makes the use of the quantiler 
appealing. 

DATA COMPRESSION RATIOS 

If the quantile data are to be a useful form of data compression, a quantiler 
must be of simple construction. The quantiler design introduced below is one of 
extreme simplicity; notably, no arithmetic operations are performed. As is dis- 
cussed in the section entitled "Theoretical Background of the Quantiler, "if quantile 
data are to be useful, the loss that is involved in using quantiles instead of the 
entire sample must be considered in relation to the degree of data compression 
that can be achieved. The theory will be discussed from a practical point of 
view, as it applies to the quantiler design. 
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A particle count experiment, the outcome of which has a discrete distribu- 
tion, or empirical distribution or histogram, is used as an example. Before the 
quantiles are determined, the data are grouped into a familiar histogram. 
Quantiles of empirical distributions are called sample quantiles. Figure 4-1 
shows a typical histogram with four such sample quantiles of orders 0.067, 
0.291, 0.709, and 0.933, for 1024 samples. 

The sample quantiles are determined by first deciding how many quantiles 
are to be computed and transmitted. As few as four sample quantiles convey a 
great deal of useful information. The answer to the statistical questions, 
"Which quantiles should they be?" and "How good is the information which is 
obtained from the four quantiles relative to that obtained from the entire sam- 
ple?" depends on what one wishes to compute. 



Figure 4-1. A Typical Histogram, 1024 Samples 
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It should be noted that transmitting quantiles is better than deriving the 
mean and standard deviation at the test site and then transmitting these moments, 
even when the greater design simplicity of a quantile system over an arithmetic 
system is ignored. The knowledge of the quantiles affords additional knowledge 
of the distribution. For example, the distribution might be bimodal (i.e., have 
two local maximums) rather than unimodal. Such a condition would be detected 
using the quantile method but not using the simple mean and standard deviation 
data. Furthermore, goodness -of -fit tests have been devised for use with quan- 
tiles. These are tests which tell an experimenter whether it is reasonable to 
assume that a given set of experimental data arises from a supposed probability 
law. This is an important type statistical test in all phases of experimental 
work. Discriminating between two similar distributions using quantiles extends 
the applicability of quantiles to data compression still further. Further mathe- 
matical details can be found in Reference 2. 

The data compression ratios that can be achieved using four quantiles are 
demonstrated by the following example: The 1024 samples are assumed to be 
particle counts from 1024 sampling periods. The spread between samples is 
assumed to be within 128; that is, no more than 127 particles arrive in any one 
second. A sample size of 1024 is chosen, since it is in the interest of the best 
estimates of the mean and standard deviation to minimize the variance of the 
estimates, the variance being inversely proportional to the total number of 
samples. In fact, the total number of samples should be as large as possible, 
so long as the incoming data remain stationary. The sample spread should also 
be large enough to handle a maximum variability in the data rate. Both of these 
parameters must be limited, however, to hold down the volume of hardware re- 
quired. As for the sample spread, additional hardware can be more efficiently 
applied to vary the sampling time, instead of providing extra storage capacity. 
This self-adaptive feature will be discussed later. 

In comparing the transmission of all samples (7 x 1024 bits) with the trans- 
mission of four quantiles (28 bits) per sampling cycle, a compression ratio of 
250:1 results. However, consideration of the effect of bit errors for the two 
types of data gathering and transmission methods leads to the conclusion that 
the signal-to-noise ratio for the quantile transmission should be increased over 
that of the raw-data system. Doubling the signal-to-noise ratio, for example, is 
equivalent to doubling the bit time with constant power; the compression ratio 
would thus be reduced to 125:1. A further reduction is demanded by the quantile 
estimation efficiency. For example, with a typical efficiency of 80 percent, the 
final data compression ratio is found to be in the order of 100:1. By transmitting 
1/100 as much data, estimates of the mean and standard deviation of the parent 
population can be obtained with variances no higher than if the full sample of 
uncompressed data were used. In effect, this means that 100 percent efficiency 
is obtained with data compression ratios of 100:1. 
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PROTOTYPE QUANTILER 


One important novelty in the quantiler design is to be found in the method by 
which the histogram is formed. Other operational functions will be described 
in reference to the quantiler functional block diagram (Figure 4-2). 

Histogram Storage 


A quantiler that generates the quantiles for a total of 1024 samples with a 
sample range of 256 is described in the remainder of this paper. In order to 
cover all possible sample distributions, prior technology has used 256 registers 
of log 2 1024 = 10 bits for a total of 2560 bits to store the complete histogram. 
With the histogram-forming method used by the quantiler, however, only 1280 
bits of storage are used. This reduction is accomplished by using a linear 
storage scheme whereby 1024 spaces (binary zeros) are separated by 256 
markers (binary ones) in a serial memory. The markers are used to identify 
the address of the memory location and the number of spaces between markers 
represents the number of times that the address appeared as a sample value. 

Sampling Rate Controller 


The prototype quantiler was built with a particle counting experiment in 
mind. A source of particles randomly distributed in time was assumed so that 
the input to the quantiler is merely a counter. 

As was stated above, the sample range was 256, which means that an eight- 
bit counter is used. The input counter must be strobed periodically to sense 
each data point to be stored. By changing the time between the strobes, varying 
data rates can be accommodated. The criterion chosen was 16 overflows of the 
sample counter during the recording of the 1024 sample histogram to double the 
sampling rate. To accommodate a falling data rate, where perhaps the complete 
histogram would wind up in only a few lower slots , the highest sample value is 
observed after the complete histogram has been recorded. If this highest reading 
is lower than half of the allowable spread, the sampling rate is reduced by one- 
half before the next histogram is taken. In fact, this automatic control of the 
sampling rate as a function of the incoming data rate can be considered a 
secondary data compression feature. 

Block Diagram Analysis 


Figure 4-2 shows a block diagram of the prototype quantiler. The input 
random data, derived from a "random pulse generator" (Reference 6), is counted 
in the data counter and loaded in parallel into the hold register on command from 
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Figure 4-2. Quantiler and Associated Input Output Test Equipment 

































the sampling rate controller, where it is held until stored in the histogram 
memory. 

The histogram memory is a 1280-bit recirculating delay time. The line is 
resynchronized between each histogram compilation by storing 256 consecutive 
ones followed by 1024 zeros. An eight-bit address counter is used to count the 
ones during synchronization and during data loading to keep track of the address 
at which the data is to be stored. A cumulator of ten bits counts the 1024 zeros 
inserted during preload and also counts the samples taken in the histogram store 
mode. As the delay line recirculates, the address counter counts the ones as 
they pass. When the address counter reaches the value stored in the data hold 
register, a coincidence circuit sends a command to the central control to insert 
a zero just after the binary one that caused coincidence, and then to delay the 
remaining bits of memory by one pulse. This is done by temporarily adding one 
bit to the delay line until the last address marker has passed the loading point. 
Since there are nothing but zeros from this point back to the first address 
marker, deleting the extra bit at this time will result in losing a zero, thus re- 
storing the line to the required 1024 zeros and 256 ones. This process is re- 
peated serially 1024 times, removing the 1024 zeros from the end of the line and 
rearranging them between the ones according to the manner in which the data 
behaved. 

After the cumulator counts the 1024 th sample, the machine goes into the 
"quantile compute" mode. In this mode the cumulator counts the zeros in the 
delay line while the address counter still counts the ones. Since sample quantiles 
are percentage points of the histogram area, and this area has been constrained 
to be constant (namely, 1024), the quantiles can be computed simply by counting 
a fixed number of samples. The four quantiles are thus hard-wired into four 
different comparator circuits. When the cumulator count equals the value of the 
desired quantile, a command is given to transfer the value of the address 
register into the quantile register, where it is then ready for a data buffer for 
the telemetry. 

Display Devices 


The prototype quantiler was built with a CRT display of the histogram and 
with the cumulative function and the four quantiles displayed in octal. Figures 
4-3 and 4-4 are tracings of the pictures obtained from a binomial distribution 
and a bimodal distribution. It can be noted that, even with only tour quantiles, 
the bimodal nature of Figure 4-4 is obvious from the quantiles only. 
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Figure 4-3. Tracing Of Picture Taken Of A Binomial Distribution 
Experimental Results 

Tests were performed to determine the mean and standard deviation of 
known distributions by sampling and computing these parameters from the ob- 
served quantiles. The estimates of these parameters were calculated by the 
formulas (taken from the first section) 

A = 0.141 [z(0. 0668) + Z( 0.9332)] + 0.359 [Z( 0 . 2912 ) + Z( 0 . 7088 )] , 

and 

a- = 0.258 [Z(0. 9332) - Z(0.0668)] + 0.205 [Z(0.7088) - Z(0.2912)] . 
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Figure 4-4. Tracing Of Picture Taken Of a Bimodal Distribution 


The results of these tests are given in Table 4-1. It is observed that in all cases 
the agreement between the observed and the theoretical values is excellent. In 
fact, in the case of the "renewal" data, the variance calculated from the quantiles 
disclosed an error in the calculation of the theoretical variance. 

APPLICATION OF THE QUANTILER 

Although the motivation for developing the quantiler was the need for data 
compression on deep space flights, the relevance of quantile technology to other 
nonspace telemetry situations became obvious. One use is in industrial quality 
control. Suppose a product is being manufactured automatically in an inaccessible 
location. Many quality control techniques use the mean and variance of certain 
features of the manufactured product to detect departures from the state of 
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Table 4-1 


COMPARISON OF EXPERIMENTAL AND THEORETICAL RESULTS 


Known Distribution 

Sample Quantiles 


Cr 


cr 

Binomial fj. = 50 

43 

47 

52 

59 

49.9 

5.15 



a = 5.0 

43 

47 

52 

58 

49.8 

4.89 

49.8 

4.90 


43 

47 

52 

57 

49.6 

4.64 



Binomial n = 125 

109 

118 

130 

140 

124.1 

10.46 



o- = 10.825 

109 

118 

131 

142 

124.8 

11.18 

124.5 

10.71 


109 

119 

130 

141 

124.6 

10.51 



Overlapping Windows - 111 

104 

116 

133 

149 

125.1 

15.10 



At - 125 

103 

116 

133 

149 

124.9 

15.35 

124.99 

15.29 

a = 15.31 

104 

116 

132 

151 

125.0 

15.41 



Overlapping Windows - 101 

106 

117 

132 

149 

125.4 

10.00 



At = 125 

106 

116 

131 

145 

125.1 

10.51 

125.22 

10.25 

cr = 10.45 

106 

117 

132 

147 

125.2 

10.25 



Renewal Process - 111 

61 

67 

75 

82 

71.9 

7.06 



= 71.43 

61 

67 

75 

83 

71.3 

7.32 

71.47 

7.23 

cr = 7.19 

61 

67 

75 

83 

71.3 

7.32 




"statistical quality control." A quantiler at the data source would compute the 
quantiles for transmission to the quality control engineer. From these quantiles, 
the engineer would receive information not only of the mean and variance but 
also of other features which may be even more important in quality control. 

For example, if a bimodal distribution is detected, it could indicate that a ma- 
chine malfunction occurs part of the time, changing some statistical character- 
istics of a certain fraction of the output. 

Another possible application is the metering of utility usage to determine 
utility bills. Some utilities charge on a basis not only of average use but of 
fluctuations about average. If the use per half-hour is recorded in the subscribing 
facility and quantiles are formed locally once a week and transmitted to the 
utility, a charge can be made based on mean and variance. The rate commissions 
probably can be convinced to go along with such schemes. 

An application to automatic traffic control also exists. At remote sites 
throughout a city, the spectrum of vehicle velocities is taken. Quantiles of this 
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velocity spectrum are transmitted to a central computer, which uses the infor- 
mation to monitor traffic. For example, a low average velocity with a large 
variance indicates an impending traffic jam. 

Other areas for civilian use will be mentioned briefly. Workers in highly 
radioactive environments can use quantiles to monitor the radiation spectrum. 
The Weather Bureau can use quantiles to give useable information on quarter- 
hour temperatures at a given location without having to transmit the temperature 
every 15 minutes. 

Even in space applications there are other types of experiments (other than 
particle count) to which quantiles apply. Energy spectra of incoming particles 
are distributions and, as such, can be effectively transmitted by quantiles. The 
same applies to mass spectrograms in planetary surface life-detection experi- 
ments. 

In fact, quantiles afford an extraordinary saving whenever any kind of 
curves, spectra, or distributions have to be transmitted. We believe that the 
potential applications (especially in a technology where remote sites are linked 
together by computers) are indeed far-reaching. 
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5. A STUDY OF DATA COMPRESSION BUFFERING 

R. Hollenbaugh 
Goddard Space Flight Center 

Greenbelt, Maryland . - - - 

N67-27407 

In a data compression system which removes redundancy from a digitally encoded source, a 
rate interfacing problem may be encountered. Since the compression operation depends on the 
data statistics, the output bit rate of the compressor will vary with time. If it is assumed that the 
source rate does not change and that the total number of bits required to encode the source can be 
successfully reduced, the average bit rate out of the processing equipment will be reduced. If the 
equipment receiving the output of the compressor requires a constant bit rate input, an interfacing 
function will be necessary. 

The simplest method of achieving rate compatibility is to place a buffer memory between the 
data compressor and the readout equipment. Before the readout process is started, the buffer 
memory is filled. The relationship between the buffer readin rate (determined by the success of 
the compression operation) and the buffer readout rate (which shall be a constant) determines 
whether the buffer is being filled or emptied. 

Figure 5-1 is a block diagram of a system utilizing a buffer memory for rate interfacing. 
The two control lines originating from the buffer prevent buffer overflow and buffer emptying. The 
INHIBIT control sets the source rate to zero when the buffer memory is full. The DEMAND con- 
trol is necessary to assure that the buffer does not empty before the source is exhausted. 


DEMAND 



S = TV Scan Rate In Lines per Second. 

B l = Bits Required to Encode One Line of TV Data Before Compression. 

Y = Bits required to Encode One Line of TV Data After Compression. 

K = Increase in Scan Rate for Data Compression Operation. 

X = Buffer Size, in Bits. 

Figure 5-1— Block diagram of a system utilizing a buffet memory for rate interfacing 
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The bit compression ratio C R = B b /B a is the ratio of the number of bits required to encode 
the source before the compression operation B b to the number of bits required to encode the 
source after the compression operation B A . The INHIBIT control does not affect the C R that the 
system can achieve. The DEMAND control, however, can decrease the Cr achieved by the system; 
exercising this loop increases the number of bits required to encode the source. Ideally, then, it 
is desirable to have a buffer memory large enough so that it is not emptied before the source is 
exhausted. 

The results described herein show how the C R that the system can achieve is related to buf- 
fer size and the source redundancy that the compression operation can remove. 

The controlled-rate source will be a TV camera with 1000 lines and 1000 elements per line. 
Each element will require six information bits to encode it. The system variables are defined at 
the bottom of Figure 5-1. The readin rate of the recorder is defined in terms of the system 
parameters. 

In the analysis of the system operation the buffer memory is separated from the system and 
its operation is analyzed in terms of its input/ output parameters. For a given distribution of the 
redundancy within a picture a piecewise analysis can be performed. With the assumption of a con- 
stant readin rate, Equation 1 for T £ , the time required to empty the buffer, and Equation 2 for B £ , 
the total number of bits passing through the buffer before it is emptied, can be derived: 


and 



(KY) n X 


(1) 


Be 



( 2 ) 


The more it is necessary to use the DEMAND control, the greater is the loss in the compres- 
sion ratio achieved. If B A = B , no loss in C R is realized since the buffer does not become emptied 
before the source is exhausted! The worst- case distribution of redundancy is that distribution 
which causes the buffer readin rate to be at its lowest for the longest time. This situation occurs 
when all of the nonredundant information is located at the beginning of the picture and the remainder 
of the picture is completely redundant. Equations (3) show the minimum buffer size necessary to 
avoid a reduction in the system C R as a function of U\ the number of initial lines of nonredundant 
data: 


X > 20, 265 + 6, 735U' 

and (3) 

X ^ 579,000 - 579U T . 

(The equation defining the minimum value of X predominates.) 

The above equations assume a minimum of 6/7, a maximum C R of 2000/7, and K = 10. 
From C R = B b /B a one can derive 


( 4 ) 


where F a is the fraction of the original picture which can be compressed with a bit compression 
ratio of C a , F b is the fraction of the picture which can be compressed with a bit compression ratio 
of , and so forth. 
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For a given distribution of redundancy one can calculate the achieved compression ratio C 
by using Equation 4. Using Equation 4 and assuming the worst case distribution of redundancy and 
K = 10. 


for X ^ 7000U' ; also, 


Cr 


Cr 


210, 000 


6, 979U 1 - X + 21,000 

6 , 000 , 000 

600,000 + 6,400U ,_ - X 


for X 1 7000U’. 

The compression ratio that could be achieved without the buffering effect is 

6 , 000,000 


Cr = 


21,000 + 6.979U' 


For the case where the redundancy is evenly spread throughout the picture 


(5) 

(6) 


(7) 


_ 6 , 000,000 

3ET (8) 

600,000 51 

by use of Equation 6. 

In Equation 8? C RI is the compression ratio achieved by the system if no buffering effect 
occurs. Again we assume K - 10. 

In summary, two specific cases of redundancy distribution have been analyzed. It has been 
determined that the effect of the buffer size is appreciable only for cases of high compression 
ratios. The value of K has been found to be a powerful influence in the system operation. The 
buffer size can be much smaller than the picture size and still perform effectively. 
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6. RESULTS OF ADAPTIVE PREDICTOR STUDIES. II 

J. A. Sciulli 

Goddard Space Flight Center 
Greenbelt, Maryland 


INTRODUCTION M / -7 * ~ ~ \ . 

Nt5 7-27408 

In recent years studies of data compression have warranted the attention of many investi- 
gators. Since demands for large amounts of scientific data are increasing, methods for more 
efficient data transmission must be developed. Usually a communications system is designed so 
that the information source is sampled at a constant rate determined by the most active data peri- 
ods. During a large percentage of time the data are relatively quiescent and so redundant samples 
are transmitted. Data compression by prediction is a promising method of redundancy removal 
and is therefore the subject of many recent studies. A survey of the literature shows that two 
philosophies are being proffered for the solution to this problem. The first approach might be 
called the ’’state of the art” point of view where efforts have been focused on studying well-known, 
easily implemented techniques such as the zero-order and first-order predictors. References 1 
and 2 are typical examples of this point of view. The philosophy of this approach is that the sim- 
pler schemes are within "state of the art” spacecraft instrumentation capability and are certainly 
easier to study and simulate. Those who have chosen this route generally feel that more sophis- 
ticated approaches are too complex to have any application value. 


The second school of thought has chosen a more sound theoretical foundation in offering a 
solution to the data compression problem; the work of Balakrishnan (Reference 3) represents this 
latter philosophy. It is true that this approach at the present time appears to be difficult to in- 
strument for spacecraft use; nonetheless, it is highly desirable to concentrate on the more sophis- 
ticated methods, especially since a high degree of onboard data processing capability (e.g., random 
access memory and arithmetic capability) will be available in the future. 


This report is intended to develop part of the work reported in Reference 3 as well as to com- 
plement the work reported in Reference 4. It deals with the description, simulation, and analysis 
of the results of the application of the conditional expectation predictor to the compression of video 
data and presents observations on alternate methods and possible applications of the findings. 
Suggestions for future study are given in the concluding section. 


THEORY OF ADAPTIVE PREDICTION SYSTEM 

Before describing the prediction mechanism it would be worthwhile to state a few definitions 
and observations. The word "adaptive” implies modification to meet new conditions. A truly 
adaptive system is characterized by its ability to ( 1 ) monitor its own performance with respect to 
some performance criterion, (2) learn of new conditions, and (3) adjust its structure to fit the new 
conditions. In a real communications system no a priori knowledge of the statistical structure of 
the information source is available. The data compression technique to be described in this re- 
port satisfies the definition of adaptivity and also requires no a priori statistical knowledge of the 
data. 

Consider a sequence of discrete samples of the form shown in Figure 6-1. Assume that each 
sample may take any one of Q discrete values. Suppose a random variable x is defined such that 

^ = ( X i - M’ X i-M+1, • • * X i - 1 ) ‘ (1) 
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Memory 

M 


Learning period 
L 


Past data 


Future data 


Figure 6-1 —Sequence of discrete data samples 


The sample space size of the random vector X depends on the choice of the memory size M . Since 
each sample may assume any one of exactly Q discrete values, the sample space size of X for 
a memory size M is simply 

S = Q M . (2) 


Suppose, in addition, a second random variable * is defined such that 


y = x. for 1 ^ j | Q ' ' 

and corresponds to the data sample immediately succeeding X . Assume that we have been observ- 
ing and recording the immediate successors to the random variable X over a number of samples 
denoted by L , the learning period, and that our operation must determine the optimal prediction for 
the ith sample. The optimal prediction x. is given by 


e ty/x = (Xj _ M . X j _ M+lj 


(4) 


the notation on the right-hand side of the first equal sign is read as the "conditional expectation of 
X given X = (x. M , x. M + 1 , • • • x._ t ) In the case of discrete data x. is given simply by 

Q 

XjP [* = X j/X = (*!-„■ 



where x is a possible successor to X and Pty = x./X= (x. M , x._ M+1> . . is the probability 

that y = x. given X = (Xj_ M . x._ M+ , , • • • x._,) . If the data are assumed to be a long sample from 
an ergodic process, Equation 5 represents the "best" RMS predictor, since the mean is that point 
about which the second moment is minimized. 


To illustrate by example, assume that k observations of a particular x are made and that at 
each observation the value of the immediate successor to X is recorded. Suppose a prediction is 
required for the immediate successor to the (k + l)st observation of this particular X . According 
to Equation 5 the optimal prediction is the mean of the sample of past successors to X and is given 
by 


K + 1 



( 6 ) 


where x. is a possible successor to x,is j^Q, and k. is the number of times x. was observed. 
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Actually one could choose a statistic other than the mean and correspondingly minimize 
some prediction error criterion other than the mean square error. The mode, for example, could 
be used as the prediction for the immediate successor to the random variable X. Utilizing the 
mode as the predictor minimizes the probability of error. In order to implement the mode pre- 
dictor, histograms representing the distribution of the immediate successor to each particular x 
in the sample space are constructed. The most frequent successor then becomes the prediction.* 
One could also choose the median as the prediction; choice of the median minimizes the absolute 
error. It is interesting to note that if the data were both Gaussian and stationary then the mode, 
median, and mean would produce identical prediction results. 


COMPUTER SIMULATION OF CONDITIONAL EXPECTATION PREDICTOR 

The results of this work were obtained from simulations on the IBM 7094 computer using 
Tiros TV cloud- cover picture data as the information source. Reference 4 contains a good deal 
of background information on these data, including their origin and subsequent formating for com- 
puter simulation. A Tiros TV picture is nominally a 500- scan- line picture with each line com- 
posed of 500 TV picture elements. This study has been made on 10 meteorologically significant 
Tiros TV cloud-cover pictures (Figures 6-12 through 6-21); these pictures are the same pictures 
as those used for the study reported in Reference 4. Results have been obtained with each TV 
element quantized to 4 and 6 bits. 


|0| 112131415161718191101111121131141151 

I- X, -| 


(a) 

Q = 16, M = l, S — 16 



0 = 16, M=2, S = 256 



Figure 6-2-Memory cell geometries for Q = 16 and M= 1, 2, and 3 


This technique was implemented by Davisson of Princeton during his participation in the 1965 Goddard Summer Workshop (Reference 5). 
His results in some cases were somewhat better than those using the mean as the prediction. 
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Assume that the video data is to be scanned one element at a time from left to right and top 
to bottom, beginning with the top leftmost TV element. The choice of the parameter M (memory 
size) determines the number of M-dimensional cubes (called M-cubes in this paper) which are re- 
quired to store the statistical structure of the data. For example, if the data are quantized to 16 
levels Q and a memory size M of 2 is chosen then there must be exactly Q M or (16) 2 = 256 2-cubes 
required. Figure 6-2 shows memory cell geometries for Q = 16 and M = 1, 2, and 3.* The process 
begins by scanning the data one element at a time and observing the random variable X . At each 
observation of X the prediction for its immediate successor is computed from the statistics stored 
in the M-cube associated with the particular x under observation. The prediction error is given by 


where x a is the actual value and x p is the predicted value. If E p % T where T is a preset allowable 
error threshold, the element is predictable and need not be transmitted. If, however, Ep > T this 
particular element is not predictable and must be transmitted in unmodified form. 

The data compression system at the transmitter end must provide the receiver with the data 
necessary to reconstruct the original message within the allowable error threshold T . To accom- 
plish this the predictor at the transmitter end must operate on exactly the same data which it will 
send to the receiver for reconstruction. Therefore, if an element is predictable (E p £ T) it need 
not be transmitted, but the predicted value is treated as though it were the actual value and is also 
used to update the statistics stored in the M-cube defined by the X under observation. If, however, 
an element is not predictable (E p > T) the actual value is used to update the statistics. This is 
called "closed-loop" operation. The prediction mechanism could be evaluated in the "open-loop" 
mode. In open- loop operation predicted values do not replace actual values, thus the predictor 
operates on raw data only. The studies described in this report, however, were done in the closed- 
loop mode. 

The example given previously described the formulation of the sample mean in terms of Equa- 
tion 6. In the computer simulation it is not necessary to keep track of the relative frequency terms 
k./k because each M-cube defined by x can be composed of two storage locations, a sum location 
and a counter location. At each observation of X , the sum location corresponding to this X is up- 
dated by adding to the existing sum either the actual or predicted value of the successor to X de- 
pending on whether the element is predictable. At the same time the corresponding counter loca- 
tion is incremented by one count for each observation of X. The kj terms of Equation 6 are 
implicitly contained in the sum at all times. Therefore the prediction computation need be per- 
formed only when a prediction is required and is easily obtained by dividing the sum by the counter. 

Because the learning period includes only a finite amount of past data, a prediction for the 
successor to a particular value of the random variable x could frequently be indeterminate be- 
cause of a complete lack of past information; this is especially true at the beginning of the learn- 
ing process. One solution might be to determine a prediction from the statistics contained in the 
M-cubes neighboring the particular M-cube defined by the X under observation. This approach, 
however, does not solve the problem at the beginning and in the very early stages of the learning 
period. The obvious solution then is to make some initial assumption for the successor to each of 
the values which the random variable X can assume before the learning process begins. If it turns 
out that the initial assumption was a poor one it will affect the efficiency of the prediction mecha- 
nism less and less significantly as more and more of the data are observed. This scheme was 
utilized in the simulation of the conditional expectation predictor and the choice of the prelearning 
assumption was made based on the results of the zero-order hold predictor (Reference 4). 


For memory sizes larger than 3, the number of storage locations required becomes unwieldy 


for practical computer simulations. 
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Experiments with the zero-order hold predictor showed that very frequently an element in- 
tensity was within ±1 or ±2 quantum levels of its predecessor. Thus, if the random variable X as- 
sociated with the conditional expectation predictor is a 2-dimensional random vector 

X=(x r , x r + 1 ), (8) 

the prelearning assumption for the successor to the (r + l)st element is the (r + l)st element. 
Similarly, if X = (x r _ v x r , x r + 1 ) , the prelearning assumption would again be x r + x . 

Quite often a suitable prediction cannot be derived from the statistics contained in the par- 
ticular cube defined by the X under observation. When this occurs it is possible to utilize the 
neighborhood statistics as the source of a secondary prediction. For example, (Figure 6-3) sup- 
pose the random variable under observation is 

X = (i, j) IS i < Q, 1 ^ j SQ, 

where i and j are values of element intensity specifying the coordinates of a specific 2 -cube in 
the memory array. Suppose that the conditional expectation calculated from the statistics con- 
tained in (i, j ) is inadequate; that is, E > T. As soon as it is determined that the prediction error 
E p exceeds the threshold T, a secondary prediction is provided by computing the mean of the sta- 
tistics contained in the 2 -cubes in the neighborhood of cube (i , j ). The boundaries of the neighbor- 
hood are governed by the allowable prediction error threshold T so as to accommodate the fidelity 
criterion. For example, if T is ±1 quantum level and a suitable prediction cannot be made from 
cube (i, j ), the cubes which are not more than ±1 quantum level away from (i, j ) are those from 
which the secondary prediction is determined. The concept of providing a secondary prediction if 
the primary prediction fails is in itself attractive, but this attractiveness is somewhat dulled when 
one considers that the use of alternate prediction modes in the same compression mechanism 
complicates the coding problem since the receiver must determine the source of each prediction. 
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Figure 6-3— Two-dimensional memory cube and its neighboring cubes 
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The discussion thus far assumes that the TV data are observed serially one element at a 
time, scanning from left to right. There is some advantage, however, in observing the data 
not only from left to right along a TV line but also from line to line so as to take advantage 

of the vertical correlation in the TV data. 
Figure 6-4 depicts the geometry of the TV data. 

If one wishes to operate the prediction mecha- 
nism only on data scanned serially from left to 
right, the random variable X would take the 
form of an ordered pair of adjacent elements 
on the same line; e.g., typically, X = [x it x A> j+1 ]. 
If, however, one wishes to take advantage of 
line-to-line correlation, X might consist of an 
ordered pair of TV elements of the form 
X = ( Xi j , x i+ i, j+ i) where the element to be pre- 
dicted is x itj + 1 . This scheme might be termed 
an elementary, two-dimensional predictor. 

So far not too much has been said about 
the learning operation. Actually, the learning 
operation of the conditional expectation predic- 
tor (Method II of Reference 3) is not so ex- 
plicit as that of the linear predictor (Method I 
of Reference 3). In Method I the learning period is composed of about 20 data samples preceding 
the elements to be predicted. The function of this learning period is to develop an optimal opera- 
tor based on these 20 previous points. In Method II, however, the function of the learning period 
is to determine the optimal operation to predict the successor to the present observation of the 
random variable X. This is the basic difference between Method I and Method II. Method I de- 
termines an optimal operator based on a few points preceding the elements to be predicted, while 
Method n determines the optimal operation based on previous observations of the successor to the 
particular X under observation. Also, for Method I either a linear or a nonlinear operation is 
explicitly chosen. For example, Method I as it is described in Reference 4 is very obviously linear. 
Method II, however, does not distinguish between linear and nonlinear operations. The conditional 
expectation predictor simply proceeds to the optimum operation without restriction to either linear 
or nonlinear operation. 

Since the learning period of the linear predictor is used to determine an operator over a 
fairly small number of previous data samples and the learning period of the conditional expecta- 
tion predictor is used to observe occupancies of a relatively large number of M-cubes, it seems 
reasonable that the second method should require a much larger learning period than that required 
by the first method. Results from computer simulations included in the discussion of results appear 
to support this argument. It is important to note that in Method I the optimal operator is found over 
a learning period just preceding the sequence of elements to be predicted and a new learning proc- 
ess does not begin until the mean square prediction error exceeds a preset threshold. In Method n, 
however, the learning process is more continuous in nature and prediction and learning take place 
almost simultaneously. 

Method I as described in Reference 4 utilizes two thresholds. The first is the threshold T , 
which is the allowable error between true and predicted values of a data sample. The second 
threshold is associated with the mean square prediction error which is calculated periodically to 
determine the prediction ability of the present operator. When the mean square prediction error 
exceeds this threshold, the prediction mechanism is signaled to restart its learning operation. 

Thus far in the description of Method II only one threshold has been mentioned. This is the thresh- 
old T which corresponds exactly to the first threshold of Method I. Method II in its present con- 
figuration does not employ a second threshold equivalent to that of Method I. 
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Figure 6-4— Geometry of TV data 
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In the first simulation of the conditional 
expectation predictor the learning- period length 
was chosen based on parametric trials with the 
element compression ratio serving as the figure 
of merit. Figure 6-5 shows these results with 
the data quantized to 6 bits per TV element, the 
memory M = 1 TV element, and an allowable 
error threshold T of ±2 quantum levels, where 
cumulative element compression ratio is the 
4800-line (10- TV-picture) average compression 
ratio. This is not an ideal way to handle the 
learning operation. It might be worthwhile to 
monitor the mean square prediction error and 
introduce a second threshold as in Method I. 
The problem with this, as in the present config- 
uration, is that the start of a new learning period 
causes a large instantaneous drop in the amount 
of statistical data available with which to make 
predictions. 



Figure 6-5— Cumulative element compression ratio 
versus learning period (in TV lines) with Q = 64 
(6 bits/element), M= 1, T= ±2 quantum levels 


A solution free from this problem is to allow the statistical structure to decay slowly to some 
effective N element average. Each M-cube of the memory array is composed of a summer and a 
counter. Suppose the counter is allowed to buildup freely to N observations and future observa- 
tions are handled as follows: Let 


cr = Sum contained in the sum location after N observations 


and 


P N+ 1 = (n + l)st sample. 

Then at the (n + i)st observation a is replaced by 


Furthermore, 


^+2 ( CT N +1 + ^ N + 2 ^ 


^N+l - ( a N + ^N+l) (n~Tt) 

(n~TT ) = ( °N + ^N + i) (nTt) + ^N + 2 (nTT 


(9) 


°N+3 " (^ + 2 + ^N+3^ l N + 1 


and so on. Thus the most recent observation is weighted most significantly; the second most re- 
cent observation, the second most significantly; and so forth. 


DISCUSSION OF RESULTS 

Figures 6-12 to 6-21 are copies of the Tiros TV cloud-cover pictures used in this study. 
These 10 pictures are the same as those used in the study reported in Reference 4. The back- 
ground of the original analog data, the construction of the unmodified digital pictures, and the de- 
scription of the display of these same pictures after processing with the prediction mechanism are 
also contained in Reference 4. The pictures which appear in this report probably will have lost 
some of the linearity of the gray scale because of the reproduction process but their overall qual- 
ity should not be degraded because of the large number of gray scales present. 
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The complete data compression system embraces two problems, the prediction problem and 
the coding problem. Although they are not independent, it is possible to think of them as two dis- 
tinctly separate problems. In order to separate them one needs to impose the constraint on the 
prediction mechanism that it at least does not hamper the coding mechanism in reasonably repre- 
senting the data. With this consideration in mind the results obtained so far can be presented in 
two parts. The first section will deal with the characteristics of the prediction mechanism with 
element compression ratio as the standard of comparison. The second section presents some pos- 
sible approaches to the coding problem with bit compression ratio as the standard of comparison. 


Results of Simulations of Prediction Mechanism 

Since it was assumed that the prediction problem and the coding problem were separate, the 
objective of the simulation of the prediction technique was to maximize the element compression 
ratio. Element compression ratio is defined as the ratio of the total number of TV elements in the 
original unmodified picture to the total number of unmodified TV elements which must be trans- 
mitted after the picture is processed by the prediction mechanism. Element compression ratio 
then is simply a measure of the "predictability" of the data and certainly does not include coding 
considerations. The objective of the initial work was to simulate the technique and evaluate the 
results with the element compression ratio serving as the figure of merit. 


Learning Period Considerations 

Table 6-1 shows element compression ratios for the basic prediction scheme with the data 
quantized to 6 bits per TV element, M = 1, T = ±2 quantum levels, and learning periods varying 
from 2 TV lines to 480 TV lines in one picture. There certainly are no significant gains in com- 
pression ratio for any of the learning periods used. However, as a first choice one might pick a 
learning period length of 256 samples (approximately 1/2 TV line) for this case. The reasoning 
for this is quite simple. Consider the general case with a memory size M , with the data quantized 
to Q quantum levels. As described earlier in the report, prediction depends on the conditional 


Table 6-1-Various Learning Periods with Q = 64 (6 Bits/Element); M = 1 and T = ±2 


Figure 


Element Compression Ratio for Learning Period L of — 


Number 

480 TV lines 

240 TV lines 

48 TV lines 

24 TV lines 

16 TV lines 

10 TV lines 

2 TV lines 

12 

5.478 

5.740 

5.970 

5.978 


5.964 

5.601 

13 

4.814 

4.820 

5.023 

5.024 


4.967 

4.725 

14 

4.322 

4.432 

4.725 

4.720 

4.709 

4.699 

4.479 

15 

4.873 

4.935 

5.063 

5.082 

5.094 


4.878 

16 

3.908 

3.962 

4.053 

4.053 

4.042 

4.062 i 

3.897 

17 

3.688 

3.710 

3.822 

3.814 

3.810 

3.793 

3.660 

18 

4.206 

4.205 

4.329 

4.335 

4.362 

4.400 

4.300 

19 

2.376 

2.385 

2.450 

2.460 

2.452 

2.447 

2.381 

20 

4.256 

4.237 

4.461 

4.533 

4.527 

4.559 

4.336 

21 

9.550 

9.527 ! 

10.103 

10.205 

10.381 

10.453 

10.025 

Cumulative 

Element 

4.251 

4.292 

4.452 

4.465 

4.462 

4.462 

4.290 

Compression 

Ratio 
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expectation of the successor to a random variable X whose sample space size depends on M . In 
particular, the sample space size s = Q M . Thus, if M = 1, Q = 16, and the objective is to predict X p 
when X is known, there are exactly (16) 2 or 256 possibilities for the set (x p , X p ). Therefore, 
if all cases were equiprobable one would have to allow the learning period to cover 256 samples 
to be sure that each case was observed at least once. Thus for the general case of Q and M one 
might choose as the minimum learning period length l = q( M + i ) . This certainly does not represent 

the optimum learning period length but it does provide a quideline as to the minimum learning 
period length. One might govern the upper bound of the learning period size by investigating the 
changes in the structure of the statistics as more and more samples are observed. In any case it 
is advantageous to keep the learning period size as small as possible, since the data are suspected 
to be highly nonstationary. 

The results of successive experiments designed to test the performance of the conditional 
expectation on both 6- and 4-bit data are contained in Tables 6-2 and 6-3. Figures 6-6 and 6-7 
depict these results as bar plots. A few conclusions can be drawn from these results: 


(1) There is essentially no difference between the results for M = 1 and M = 2 without the 
statistical- neighborhood and two-dimensional prediction modes. 

(2) A slight improvement in compression ratio was achieved when the learning period L 
was reduced from 480 to 16 TV lines. 

(3) Significant improvements in compression ratio were achieved with the addition of the 
neighborhood and two-dimensional predictors. 


Table 6-2- Element Compression Ratios with Q = 64 (6 Bits/Element) and T = ±2 




Element Compression Ratio for — 


Figure 

Number 

M = 1; L = 480 
TV lines 

M = 1; L = 16 
TV lines 

M = 1; L= 16 TV 
lines with SNP 1 

M = 2; L = 16 TV 
lines with SNP 1 

M = 2; L - 16 TV 
lines with both 
SNP 1 and EAP 2 

12 

5.478 

5.935 

7.420 

8.712 

11.425 

13 

4.814 

5.006 

5.982 

7.120 

9.734 

14 

4.322 

4.709 

5.686 

6.703 

8.134 

15 

4.873 

5.094 

6.249 

7.217 

8.444 

16 

3.908 

4.042 

4.826 

5.521 

6.413 

17 

3.688 

3.810 

4.424 

4.994 

5.913 

18 

4.206 

4.362 

5.295 

6.543 

6.863 

19 

2.376 

2.452 

2.825 

3.316 

3.297 

20 

4.256 

4.527 

5.469 

6.874 

7.875 

21 

9.550 

10.381 

12.623 

15.941 

17.550 

Cumulative 






Element 

Compression 

Ratio 

4.251 

4.462 

5.330 

6.301 

7.196 


* Statistical neighborhood predictor. 
^ Element area predictor. 
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Table 6-3-Element Compression Ratios with Q = 16 (4 Bits/Element) and T = ±1 


Figure 

Number 


Element Compression Ratio for — 


M = 1; L = 480 
TV lines 

M = 1; L = 16 
TV lines 

M = 1; L = 16 TV 
lines with SNP 1 

M = 2; L = 16 TV 
lines with SNP 1 

M = 2; L - 16 TV 
lines with both 
SNP 1 and EAP 2 

12 

7.443 

7.846 

9.697 

9.762 

14.267 

13 

6.355 

6.725 

8.182 

7.995 

12.705 

14 

6.092 

6.410 

7.777 

7.566 

10.390 

15 

6.256 

6.721 

8.015 

7.786 

10.356 

16 

4.904 

5.258 

6.331 

6.523 

8.146 

17 

4.611 

4.896 

5.575 

5.753 

7.379 

18 

5.457 

5.588 

6.870 

7.051 

7.824 

19 

2.930 

3.055 

3.593 

3.750 

3.890 

20 

5.418 

6.334 

7.789 

7.923 

9.690 

21 

13.442 

13.449 

17.127 

15.550 

19.954 

Cumulative 






Element 

Compression 

Ratio 

5.494 

5.835 

7.009 

7.071 

8.787 


* Statistical neighborhood predictor. 
^ Element area predictor. 


M = 1, L = 480 TV LINES 


V yr"l M = 2, L - 480 TV LINES 


M = 1 , L = 16 TV LINES 



M=l, L = 1 6 TV LINES, WITH 


NEIGHBORHOOD PREDICTOR 

M = 2, L = 16 TV LINES, WITH 
NEIGHBORHOOD PREDICTOR 

M = 2, L = 16 TV LINES, WITH BOTH 
NEIGHBORHOOD AND AREA 
PREDICTORS 

J I I ± 


2.0 


3.0 4.0 5.0 6.0 7.0 

TEN PICTURE (4800 Line Average) ELEMENT COMPRESSION RATIO 


8.0 


9.0 


Figure 6-6-Performance bar plot for conditional expectation predictor with Q - 64 
(6 bits/element) and T = ±2 quantum levels 
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M = 2, L= 16 TV LINES, WITH BOTH 
NEIGHBORHOOD AND AREA PREDICTORS 


1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 

TEN PICTURE (4800 TV Line Average) ELEMENT COMPRESSION RATIO 

Figure 6-7— Performance bar plot for conditional expectation predictor with Q = 16 
(4 bits/element) and T = ±1 quantum level 


Comparison with Other Techniques 

Figures 6-8 and 6-9 summarize in bar-plot form the relative performance of: 

(1) The zero-order hold predictor. 

(2) The linear predictor of Reference 4 (Method I - Reference 3). 

(3) The conditional expectation predictor (Method II - Reference 3). 

The linear predictor (Method I) produces a 10-picture cumulative element compression 
ratio of about 3:1 for 6 bits per element and T = ±2. The zero-order hold predictor provides a 
compression ratio of about 4.2:1 for 6 bits per element and T = ±2 and one of about 5:1 for 4 bits 
per element and T = ±1. The conditional expectation predictor in its most elementary form (with- 
out neighborhood and two-dimensional predictors) performs slightly better than does the zero- 
order hold. The conditional expectation method along with the neighborhood and two-dimensional 
predictors shows a significant gain with a ratio of more than 7:1 for 6 bits per element and T = ±2 
and a ratio of nearly 9:1 for 4 bits per element and T = ±1. One might reason that the zero-order 
hold does very well with respect to the other two methods cited when the relative complexity of 
the schemes are considered. The only explanation as to why the zero-order hold predictor does 
this well is that the information source is very highly nonstationary. Certainly the zero-order 
hold results would be much less impressive if the information source were nonstationary. 

Note, however, that the conditional expectation predictor does about 140 percent better than the 
linear predictor and about 70 percent better than the zero- order hold. One reason that the condi- 
tional expectation predictor does so much better than the linear predictor is that the former is not 
restrictive with respect to linear or nonlinear operations and therefore is able to predict well de- 
spite the nonstationary character of the data. 

It was mentioned earlier that the incorporation of the statistical neighborhood predictor as 
an alternate prediction mode contributes to the coding costs since the receiver must determine 
which statistics were used to make the prediction. One way to overcome this problem would be 
to constrain the transmitter always to make predictions from the neighborhood statistics. Pre- 
liminary results with the technique have shown that the mechanism is not able to predict nearly 
so well when only neighborhood predictions are permitted. 
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LINEAR PREDICTOR, M = 0, L = 20, T = ±2.5 


LINEAR PREDICTOR, M = 3, L = 20, T = ±2.5 


ZERO -ORDER HOLD PREDICTOR, T = ±2 

: CONDITIONAL EXPECTATION PREDICTOR, M = 1, L = 480 LINES, T=±2 

• ~] CONDITIONAL EXPECTATION PREDICTOR, M = 2, L = 480 LINES, T = ±2 

: T::;-V ?- 'W] CONDITIONAL EXPECTATION PREDICTOR, M=H , L = 16 LINES, T = ±2 

. : x- l *~ CONDITIONAL EXPECTATION PREDICTOR, M=l, L= 16 

U : - : LINES, T = i2, WITH NEIGHBORHOOD PREDICTOR 

CONDITIONAL EXPECTATION PREDICTOR, 

| M - 2, L - 16 LINES, T +2, WITH 

NEIGHBORHOOD PREDICTOR 

CONDITIONAL EXPECTATION 

: : : : : > ■■ : PREDICTOR, M = 2, L= 16 LINES, 

T=L2, WITH BOTH NEIGHBOR- 
HOOD AND AREA PREDICTORS 

1. L_ _J I 1 I I 

3.0 4.0 5.0 6.0 7.0 8.0 9.0 

TEN PICTURE (4800 Line Average) ELEMENT COMPRESSION RATIO 


Figure 6-8— Performance of conditional expectation predictor relative to linear predictor of 
reference 4 and zero-order hold predictor with Q = 6 bits per element. 


ZERO ORDER HOLD PREDICTOR, T = ±l 


. ~ ] CONDITIONAL EXPECTATION PREDICTOR, M = l, L -480 LINES, T=±l 


CONDITIONAL EXPECTATION PREDICTOR, M=2, L = 480 LINES, T=±l 


(CONDITIONAL EXPECTATION PREDICTOR, M = l, L = 16 LINES, T = ±1 


^ • I CONDITIONAL EXPECTATION PREDICTOR, M = l, L = 16 LINES, T = ±l 

“ J WITH NEIGHBORHOOD PREDICTOR 

7 * CONDITIONAL EXPECTATION PREDICTOR, M=2, L - 16 LINES 

T =± 1 WITH BOTH NEIGHBORHOOD AND AREA PREDICTORS 

_i .1 1 1 . I l I . 1 

2.0 4.0 6.0 8.0 10.0 12.0 14.0 16.0 18.0 

TEN PICTURE (4800 Line Average) ELEMENT COMPRESSION RATIO 


Figure 6-9-Performance of conditional expectation predictor compared with performance of 
zero-order hold predictor with T = ±1 and Q = 4 bits per element 
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An explanation for the basis of the choice of the allowable prediction error T seems to be 
necessary at this point. Obviously the selection of T is very important to the performance of the 
prediction mechanism. Since the information source here is video data representing cloud- cover 
pictures the effect of the choice of T can be easily observed when the data processed by the pre- 
diction mechanism is displayed. The problem here is that a judgment of the quality of a com- 
pressed picture must be made subjectively by eye; thus the only solution is to try different thresh- 
olds until the maximum threshold which allows retention of minimum acceptable picture quality is 
determined. The choices of T = ±2 quantum levels for the 6-bit case and T = ±1 quantum level for 
the 4-bit case were made after experimenting with a number of thresholds. Figure 6-10 is a pic- 
ture showing the effects of too large a value of T with the data quantized to 64 levels and T = ±4. 



Figure 6-10— Effect of choosing too large a value of T (allowable prediction 
error). In this case Q = 64 (6 bits/element) and T = ±4 quantum levels. 


Coding Considerations 

The prediction problem, while not completely defined, has certainly been investigated more 
thoroughly than has the coding problem. The most important question is, "After prediction what 
does the transmitter send to the receiver?" This report will not deal explicitly with the coding 
problem, but will offer a few observations about it. Actually, the problem of coding for a data 
compression system is not an easy one, and very little work has been done in this area. 
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The problem with most standard coding schemes is that they require knowledge of the sta- 
tistics of the data. The prediction philosophy clearly states that no a priori knowledge of the 
statistics is necessary. It therefore seems reasonable that the coding philosophy should not be 
constrained by this requirement either.* 

In order to evaluate any hypothesis adequately it is helpful to have some standard of com- 
parison which is optimum in some sense. Suppose that P is the probability of making an accurate 
prediction and also that each of Q levels is equally likely when accurate prediction is not possible. 
If it is also assumed that the ability to predict is sample- to- sample independent, then theory ex- 
plains that in the noise-free case a bit compression ratio (including coding costs) of 



Figure 6-11— Bit compression ratio C B versus element 
compression ratio C £ for log 2 Q = 4, 6 


Piog 2 (1) +(i -P) p) 

can be approached with optimum coding. Fig- 
ure 6-11 is a family of curves of bit compres- 
sion ratio C B against element compression 
ratio C E with log 2 Q = 4 and 6. The probability 
of predicting accurately P is related to the 
element compression ratio Cj. by 


1 - 


(id 


Table 6 -4-Element Compression Ratios and Corresponding Bit Compression 
Ratios for Conditional Expectation Predictors 


Figure 

Number 

Q = 64 levels; T = ±2; 
M = 2; L = 16 TV lines 

Q = 16 levels; T = ±1; 
M = 2; L = 16 TV lines 

Element 

Compression Ratio 

Bit 

Compression Ratio 

Element 

Compression Ratio 

Bit 

Compression Ratio 

12 

11.425 

6.285 

14.267 

6.189 

13 

9.734 

5.481 

12.705 

5.607 

14 

8.134 

4.705 

10.390 

4.757 

15 

8.444 

4.851 

10.356 

4.729 

16 

6.413 

3.846 

8.146 

3.888 

17 

5.913 

3.593 

7.379 

3.705 

18 

6.863 

4.064 

7.824 

3.762 

19 

3.297 

2.218 

3.890 

2.160 

20 

7.875 

4.576 

9.690 

4.473 

21 

17.550 

9.127 

19.954 

8.219 

Cumulative 





Compression 

7.196 

4.238 

8.787 

4.136 

Ratio 






Reference 4 shows some examples of coding the compressed data with variations of run-length coding. These results are interesting 
and similar simulations might be made with the conditional expectation predictor. 
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Table 6-4 provides examples of resultant bit compression ratios for each of the 10 pictures with 
Q = 64 and T = ±2 and with Q = 16 and T = ±1. It must be made clear that these results are by no 
means quotations of bit compression ratios one could obtain in practice for the following two 
reasons: 

(1) The results assume optimum coding which would probably not be attainable in practice. 

(2) These results apply to the noiseless channel and do not account for necessary error- 
correction coding. 

These data are presented solely to provide guidelines to those who demand results which are in 
line with practical arguments. 


Comments on the TV Pictures 

Each of Figures 6-12 to 6-21 contains in the following order: 

(1) A photograph of the original analog picture. 

(2) A photograph of the original digital picture constructed from the analog data. 

(3) Two photographs of the digital data redisplayed after processing by the conditional 
expectation predictor. 

Reference 4 contains a good deal of information on the history and specific characteristics of 
many of these pictures, as well as a description of the techniques used to display them. 

The author will not attempt to give a detailed meteorological analysis for each picture but 
will rather provide a general comparison of the pictures processed by the conditional expectation 
predictor with the unmodified pictures as well as with those processed by other compression tech- 
niques. It is impossible for the untrained eye to pass judgment as to the retention of meteorolog- 
ical fidelity of the compressed pictures.* The only alternative for the layman is to compare the 
compressed pictures subjectively with the originals and to estimate the loss of apparent picture 
quality. 

When the pictures processed by the conditional expectation predictor are compared with the 
digital originals, the loss of picture quality is obvious but not objectionable. Contouring or "streak- 
iness" in highly detailed regions seems to be the most popular complaint. This contouring is caused 
by the ability of the prediction mechanism to predict long sequences of elements at the same level 
successively. This effect becomes more pronounced as T is increased. A technique which might 
partially solve this problem is the use of a weighted prediction error criterion where the predic- 
tion errors are accumulated until a present threshold has been exceeded.** 

The pictures quantized to 4 bits per element with T = ±1 (Figures 6-12(c) to 6-21 (c)) exhibit 
a higher degree of picture quality degradation than do the pictures quantized to 6 bits per element 
with T = ±2 (Figures 6-12(d) to 6-21(d)). The reason for this is that a threshold of ±1 quantum 
level at 4 bits per element is a larger percentage error than a threshold of ±2 quantum levels at 
6 bits per element. Both the 6-bit and the 4-bit pictures are displayed with 16 shades of gray. 

The 6-bit compressed pictures with T = ±2 are acceptable although the 4-bit compressed pictures 
with T = ±1 seem to be at the threshold of acceptability. Perhaps the best compromise would be 


The term "compressed” picture does not imply that the picture geometry is made smaller or more compact in any way. It is simply 
true that the amount of data required to transmit a "compressed” picture over a communications link is less than the amount of data 
required to send the original picture. 

This scheme was implemented by Davisson of Princeton and described in the report of his work in the 1965 Goddard Summer Work- 
shop (Reference 5). 
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(c) Processed cop/ generated by conditional expecta- 
tion predictor with neighborhood and two-dimensional 
predictors. Q=4 bits per TV element; T=± 1 level; 
L = 16 TV lines. Element compression ratio, 14.267; bit 
compression ratio, 6. 189. 


(d) Processed copy generated by conditional expecta- 
tion predictor with neighborhood and two-dimensional 
predictors. Q=6 bits per TV element; T=±2 levels; 
L = 16 TV lines. Element compression ratio, 1 1 .425; bit 
compression ratio, 6.285. 


Figure 6-12-Pictures from Tiros III, orbit 4, frame 2, camera 2; direct transmission from satellite; 
principal point, 43.6N, 95.5W; subsatellite point, 41 .ON, 89 .2W 
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( d ) Processed cop/ generated by conditional expecta- 
tion predictor with neighborhood and two-dimensiona 
predictors. Q=6 bits per TV element; T=± 2 levels; 
L = 16 TV lines. Element compression ratio, 9.734; bit 
compression ratio, 5.481. 


(c) Processed copy generated by conditional expecta- 
tion predictor with neighborhood and two-dimensional 
predictors. Q = 4 bits per TV element; T=± 1 level; 
L= 16 TV lines. Element compression ratio, 12.705; bit 
compression ratio, 5.607. 


Figure 6-1 3— Pictures from Tiros III, orbit 4, frame 3, camera 2; direct transmission from satellite 
principal point, 43.4N, 95.0W; subsatellite point, 40.8N, 88.8W 







;c) F.oc ,sed copy generated by conditional expecta- 
lion predictor with neighborhood and two-dimensional 
predictors. Q = 4 bits per TV element; T =± 1 level; 
L — 16 TV lines. Element compression ratio, 10.390; bit 
compression ratio, 4.757. 


(d) Processed copy generated by conditional expecta- 
tion predictor with neighborhood and two-dimensional 
predictors. Q=6 bits per TV element; T = ±2 levels; 
L = 16 TV lines. Element compression ratio, 8.134; bit 
compression ratio, 4.705. 


Figure 6-14-Pictures from Tiros III, orbit 4, frame 4, camera 2; direct transmission from satellite; 
principal point, 43.0N, 94.0W; subsatellite point, 40.5N, 88. 1W 
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(c) Processed copy generated by conditional expecta- 
tion predictor with neighborhood and two-dimensional 
predictors. Q=4 bits per TV element; T=± 1 level; 
L=16TV lines. Element compression ratio, 10.356; bit 
compression ratio, 4.729. 


(d) Processed copy generated by conditional expecta- 
tion predictor with neighborhood and two-dimensional 
predictors. Q=6 bits per TV element; T=±2 levels; 
L = 16 TV lines. Element compression ratio, 8.444; bit 
compression ratio, 4.851. 


ictures from Tiros 111, orbit 4, frame 5, camera 2; direct transmission from satellite 
principal point, 42.6N, 93.0W; subsatellite point, 40.1 N, 87.3W 






(a ) Analog original . 


( b ) Digital original . 



' ' • 


(d) Processed copy generated by conditional expecta- 
tion predictor with neighborhood and two-dimensiona 
predictors. Q=6 bits per TV element; T = ± 2 levels, 
L = i< 5TV lines. Element compression ratio, 6.413; bi 
compression ratio, 3.846. 

le 1 , camera 1 ; taped before transmission from 
VV; subsatellite point, 10.3N, 0.6W 


(c) Processed copy generated by conditional expecta- 
tion predictor with neighborhood and two-dimensional 
predictors. Q = 4 bits per TV element; T=± 1 level; 
L=16TV lines. Element compression ratio, 8.146; bit 
compression ratio, 3.888. 



( a ) Analog original 


’ b) Digital original 


j§ -a 


(c) Processed copy generated by conditional expecta- 
tion predictor with neighborhood and two-dimensional 
predictors. Q = 4 bits per TV element; T=± 1 level; 
L = 16 TV I ines. Element compression ratio, 7.824; bit 
compression ratio, 3.762. 


(d) Processed copy generated by conditional expecta- 
tion predictor with neighborhood and two-dimensional 
predictors. Q=6 bits per TV element; T = ±2 levels; 
L = 16 TV lines. Element compression ratio, 6.863; bit 
compression ratio, 4.064. 


Figure 6-18-Pictures from Tiros V, orbit 3143, frame 6, camera 1; direct transmission from satellite; 
principal point, 32.4N, 69.3W; subsatel I ite point, 33.9N, 73.4W 


L 





(a ) Analog original . 



(c) Processed copy generated by conditional expecta- 
tion predictor with neighborhood and two-dimensional 
predictors. Q = 4 bits per TV element; T =± 1 level; 
L = 1 6 TV lines. Element compression ratio, 7.379; bit 
compression ratio, 3.705. 



(d) Processed copy generated by conditional expecta- 
tion predictor with neighborhood and two-dimensional 
predictors. Q=6 bits per TV element; T = ± 2 levels; 
L = 1 6 TV lines. Element compression ratio, 5.913; bit 
compression ratio, 3.593. 


Figure 6-17-Pictures from Tiros III, orbit 102, frame 2, camera 1 ; taped before transmission from 
satellite; principal point, 13.3N, 5.6W; subsatellite point, 1 1 .9N, 1 .9W 


I 
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(c) Processed cop/ generated by conditional expecta- 
tion predictor with neighborhood and two-dimensional 
predictors. Q=4 bits per TV element; T =± 1 level; 
L — 1 6 TV lines. Element compression ratio, 3.890; bit 
compression ratio, 2.160. 


(d) Processed copy generated by conditional expecta- 
tion predictor with neighborhood and two-dimensional 
predictors. Q=6 bits per TV element; T = ±2 levels; 
L = 16 TV lines. Element compression ratio, 3.297; bit 
compression ratio, 2.218. 


Figure 6-19-Pictures from Tiros VI, orbit 1100, frame 15, camera 1; direct transmission from satellite; 
principal point, 28. 3N, 79.0W; subsatellite point, 26.1 N, 79.8W 
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(c) Processed copy generated by conditional expecta- 
tion predictor with neighborhood and two-dimensional 
predictors. Q = 4 bits per TV element; T=± 1 level; 
L = 16 TV lines. Element compression ratio, 9.690; bit 
compression ratio, 4.473. 


(d) Processed copy generated by conditional expecta- 
tion predictor with neighborhood and two-dimensional 
predictors. Q=6 bits per TV element; T=±2 levels; 
L = 16TV lines. Element compression ratio, 7.875; bit 
compression ratio, 4.576. 


Figure 6-20— Pictures from Tiros VI, orbit 18, frame 21, camera 1; taped before transmission from 
satellite; principal point, 52.5N, 45.2W; subsatellite point, 50 .4N, 37.3W 


'4 




(c) Processed copy generated by conditional expecta- 
tion predictor with neighborhood and two-dimensional 
predictors. Q=4 bits per TV element; T=± 1 level; 
L = 1 6 TV lines. Element compression ratio, 19.954; bit 
compression ratio, 8.219. 


(d) Processed copy generated by conditional expecta- 
tion predictor with neighborhood and two-dimensional 
predictors. Q=6 bits per TV element; T=±2 levels; 
L = 16 TV lines. Element compression ratio, 17.550; bit 
compression ratio, 9.127. 


Figure 6-21 —Pictures from Tiros VI, orbit 3692, frame 31, camera 1; taped before transmission 
from satellite; principal point, 36.8N, 57 .2W; subsatellite point, 33.1 N, 48.7W 
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to use data quantized to 5 bits per element and allow T = ±1, which is the same percentage error 
as 6 bits per element with T = ±2. Thus one would expect the element compression ratios for the 
5-bit, T = ±1 case to be about the same as those for the 6-bit, T = ±2 case. If these 5-bit pictures 

were also displayed with 16 gray shades then 
they would possess about the same quality as 
the 6-bit pictures. The first-order entropies 
of the unmodified digital data quantized to 4, 5, 
and 6 bits per element are given in Table 6-5. 
The entropies for the 5- and 6-bit pictures are 
almost exactly the same, while the entropies 
for the 4-bit pictures are somewhat smaller. 

The reader may find it interesting to 
compare the pictures processed by the condi- 
tional expectation predictor with those proc- 
essed by the zero-order hold and linear pre- 
dictors which are discussed in Reference 4. 

In general, the pictures processed by the con- 
ditional expectation predictor are of slightly 
better quality than zero-order-hold-predicted 
pictures. The pictures processed by the linear 
predictor are of higher quality than those proc- 
essed by either of the other two methods. 


Table 6 -5 -First-Order Entropies 


Figure 

Number 

First-Order Entropy for — 

Q = 64 levels 
(6 bits/TV 
element) 

Q = 32 levels 
(5 bits/TV 
element) 

Q = 16 levels 
(4 bits/TV 
element) 

12 

4.510 

4.472 

3.512 

13 

4.736 

4.622 

3.647 

14 

4.777 

4.678 

3.701 

15 

4.728 

4.656 

3.689 

16 

4.257 

4.201 

3.227 

17 

4.606 

4.566 

3.578 

18 

4.516 

4.449 

j 3.483 

19 

4.649 

4.561 

3.590 

20 

4.891 

4.813 

3.829 

21 

4.478 

4.443 

3.532 


APPLICATIONS FOR DATA COMPRESSION SYSTEMS 

Some of the most obvious applications of data compression systems are in deep space com- 
munications, earth- orbiting operational spacecraft, and land- line data transmission. Figure 6-22 
is a block diagram of both the transmitter and receiver ends of a data compression system model. 
At the transmitter end, the predictor accepts raw data from the information source. The predictor 
contains arithmetic, memory, and control functions which are arranged according to some predic- 
tion algorithm. Each raw data sample is compared with the corresponding predicted sample 
and the prediction error E is determined. If the prediction error exceeds some preset threshold 
T, the raw data sample must be transmitted in unmodified form. If, however, the prediction error 
is less than T , the sample is predictable and need not be transmitted. The comparator output is 
also fed back to the predictor to update the prediction mechanism. The encoder accepts raw un- 
predictable samples as well as indications of predictable samples and arranges this information 
according to some appropriate code. The information rate at the output of the encoder is certainly 
non-uniform. Since the main data- storage device would probably require a uniform read-in rate, 
a smoothing buffer is necessary. 

At the receiver end, the decoder provides the predictor with all the data necessary to re- 
construct the original message within the allowable prediction error. The predictor at the re- 
ceiver is an exact copy of the predictor at the transmitter. After reconstruction, the message is 
transferred to the information sink. 

The bit compression ratio can be a very useful parameter to a communications system de- 
signer. If C B represents the bit compression ratio, then the designer can choose to reduce the 
transmission time to T/C for the same bandwidth or alternatively reduce the original bandwidth 
to Bw /C B . If one desires to save power or reduce spacecraft weight by saving power, the signal 
power can be reduced by S/C B without changing the S/N ratio, since the thermal noise is directly 
proportional to the bandwidth. In practice, however, one would probably choose to employ data 
compression techniques to achieve high communication channel efficiency. This can be achieved 
by keeping the information rate close to the channel capacity at all times. This implies a channel 
with the capability of adapting to the time-varying information rate. 
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Figure 6-22— Block diagram of data compression system 


CONCLUDING REMARKS 

The most important outcome of this work was that the conditional expectation predictor pro- 
duced compression ratios significantly greater than either the zero-order hold or the linear pre- 
dictors. This result is true for each of the 10 TV frames used in the study and is significant since 
it shows that the conditional expectation predictor yields superior compression ratios despite the 
suspected nonstationary character of the information source. To summarize the numerical results, 
it should be noted that the conditional expectation predictor produced bit compression ratios (assum- 
ing ideal coding in the noiseless case) exceeding 5:1 on a number of single TV frames. It is also 
important that the cumulative compression ratio (10-picture average exceeded 4:1 for both the cases 
with 6 and those with 4 bits per TV element. At the same time the compressed pictures (Figures 
6-12 to 6-21) seem to retain at least an acceptable level of quality. 

Many interesting problems associated with adaptive data compression systems require further 
investigation. The prediction mechanism itself should be further developed to include, for example, 
the optimal relationship between learning period and memory size. Certainly the determination of 
efficient coding schemes for the adaptive data compression system for the noiseless channel is one 
of the most important problems still to be solved. 

Further investigations might also consist in simulating noise environments for possible mis- 
sions and analyzing effects on the noiseless-case coding structure in order to develop efficient 
error-correction codes. Investigations of this sort would eventually allow laboratory simulation 
of a complete compression system from the information source to the transmitter, through the 
communication channel to the receiver, and finally to the information sink. This arrangement 
would permit feasibility studies for specific missions as well as establish system design 
guidelines. 
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7. DATA COMPACTION STUDIES 


C. R. Laughlin 

Goddard Space Flight Center 
Greenbelt, Maryland 


N67-27409 


The Systems Division of GSFC has been pursuing an active study program aimed at develop- 
ing advanced techniques for reducing the channel capacity requirements for real-time digital 
transmission of imagery signals. Primary emphasis has been on studying the statistical nature of 
Tiros and Nimbus cloud cover photographs and on applying the modern mathematical theories of 
communication to these results. It is recognized, of course, that fundamentally the only way that 
a reduction in real-time channel capacity requirements can be effected is by simply not transmit- 
ting all of the original signal structure. However, modern information theory indicates that tech- 
niques exist (although most are yet to be developed) whereby a specific fidelity criterion can be 
attained although portions of the original signal structure are omitted; that is, the source output 
can be reduced without deterioration of the information content by means of redundancy removal. 


Further discussion can be facilitated by referring to Figure 7-1 where a source encoder is 
shown. The operation of the source encoder is defined as the organization of the source signal in 
such a manner as to remove redundancy by means of future prediction of the signal samples. 
The source encoder forms a prediction on the basis of some amount of a priori knowledge of the 
statistical nature of the signal and upon complete knowledge of the past structure of the signal over 
some finite past history. The source decoder is assumed to operate in a corresponding manner so 
that (neglecting errors in the transmission channel), if a prediction of a future signal sample is 
found to be correct, that particular sample need not be transmitted to the receiver. If there exists 
a reasonable degree of correlation between signal samples (picture elements), then the prediction 
process will be successful most of the time and the total number of samples that must be sent will 
be greatly reduced. 



MESSAGE SEQUENCE 



The best predictor, in the information theory sense, is that operation which minimizes the 
entropy of the predictors error signal. There are several approaches to the design of a predictor. 
For example, a linear Wiener predictor uses a weighted and summed set of previous signal values 
such that the mean square error is minimized. Alternatively, the predictor might employ the 
locally derived conditional probability distribution of the signal samples; in this case it is referred 
to as a Markov predictor. 
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In order to solidify the ideas of statistical redundancy in the information theory sense, one 
must turn to Figure 7-1 where the symbol c s is used to designate the source capacity or the max- 
imum information that the source could possibly deliver. The maximum value is attained when 
each of the possible levels i occurs with equal probability, in which case c s = log 2 1 bits/element. 
The actual average information conveyed by each element in the sequence (or the entropy of the 
message) depends on the actual probability with which each of the t levels occurs and is given by 

H = - y p ( i ) log 2 p(i) bi t s /e 1 ement . (1) 

i 

The nonnegative quantity (c s - h) depends only on the first-order probability distribution 
through Equation 1 and is, therefore, independent of any sample- to- sample correlation. This is 
called nonpredictive redundancy. On the other hand, the average information per element I de- 
pends on the statistical influence extending over past elements so that if S. denotes a state repre- 
senting a particular one of the possible past sequences, I is given by 


I =- y p(i,S.)logp s (i) bits/element, (2) 

i . 3 

wherep (i, s. ) is the joint probability that state s occurs and is followed by an element of the i th 
level and p s J is the conditional probability that, given the occurrence of the state s. , the next 
element will assume the i th level. 

The difference given by (H - I ) is called the predictive redundancy because it gives a quanti- 
tative measure of how well an element in the sequence can be predicted from a knowledge of past 
history and, therefore, provides a measure of how well a predictive encoding process can perform 
in a given situation. To complete the discussion of Figure 7-1, it is noted that the total statistical 
redundancy R is the difference (c s - I ) between the source capacity and the actual information rate 
and can be written as the sum of the nonpredictive and the predictive redundancy; 

R = C, - I = (C -H) + (H - I) bits /el ement. (3) 

The relative redundancy r will be useful in what follows and can be found by normalizing the 
total redundancy to the source capacity. That is, 

R + R I . 

r - — = 1 - bi t s /e 1 ement. (4) 

C C 

s s 

Extensive investigation conducted by members of the Systems Division has revealed that 
element compression ratios of 3:1 or 4:1 can be readily attained on representative photographs. 
This ratio is taken as the ratio of the number of picture elements going into the source encoder 
(predictor) to the number of picture elements coming out, and does not account for any additional 
redundancy that may be added back in by the channel encoder. Such compression ratios do not 
come anywhere near the dreams and expectations of early investigators in the field and have even 
caused many investigators to drop their studies as being hopeless. A number of highly sophisti- 
cated approaches have been tried in the Systems Division by computer simulation and by funda- 
mental statistical analysis. These include adaptive techniques optimized over local statistics. 
Results to date are in substantial agreement with those obtained by other organizations and with 
what has been reported in the open literature. 

There is much evidence to support the contention that, for a large class of imagery signals, 
the autocorrelation function for small displacements is nearly exponential in behavior. Thus, 
linear prediction on the basis of previous elements alone will show little improvement when the 
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past history is extended over more than one or two previous elements. This implies that a linear, 
first-order Markov predictor (sometimes called a zero-order hold predictor) will perform nearly 
as well as the best device that can be devised. Our studies verify this and leave us with the con- 
clusion that, on the whole, compression ratios on the order of two or three are the best that can be 
expected by exploiting the statistical properties of imagery signals with linear processing methods 

This conclusion is not altogether black because, while the gains are small, the zero-order 
hold predictor is relatively simple in terms of equipment complexity. An equally simple complete 
transmission system including such a predictor wherein this potential gain is fully realized re- 
mains to be devised. With such a system a compression ratio of 2:1 or 3:1 can be used for a cor- 
responding savings in the transmission bandwidth, transmission time, or in the total energy used. 
For deep- space camera missions, this is clearly a worthwhile goal. 

The first problem to be solved in this direction is to devise an efficient coding scheme for 
representing the compressed signal. It is hypothesized here that the predictor is a device which 
provides as its output a quantized signal which is one of l levels and which represents the actual 
signal level during a sample interval whenever the prediction process fails. It is further assumed 
that, out of a video message of N elements in length, Q of the elements are predictable so that the 
inverse of the compression ratio given by 


approaches the limit in probability of the occurrence of a nonpredictable element which will be 
designated by q. In addition, when the prediction process fails, it is assumed that the predictor 
produces an additional signal (say of l + 1) for a total of (n -q )/n = 1 - (1/K) times. The limit in 
probability of the occurrence of this event will be designated as p. 

On the basis of the above, the minimum number of bits per element N required to represent 
the predictor output which uses an ideal Shannon- Fano code can easily be shown to be 

N = [p(log p) + q( log q)] + qL b i t s /e 1 emen t , (6) 

where each of the signal levels is taken to be equally likely and L equals log 2 ^. The first two 
terms of Equation 6 within the brackets assumes a value between zero and one for all admissible 
values of p and q so that N is within the bounds 


qL I N ^ 1 + qL. ( 7 ) 

Expression 7 states the encoding problem in concise form. For example, with a six-bit-per- 
element picture signal which can be compressed by a 3:1 ratio (i.e.,K = n/q =3 * 1/q), ideally 
no less than two, nor more than three, bits per element are required to encode the predictor 
output. 

Early attempts to encode compressed data were based on run-length encoding schemes which 
depend upon predictable elements occurring in groups or runs. Studies in the Systems Division 
have resulted in a different approach which actually achieves the upper bound of Expression 7 in 
all cases. This coding scheme is indicated in Figure 7-2, the timing diagram is shown in Figure 
7-3 and the logic flow diagram, in Figure 7-4. 

The essential feature of this system is a highly efficient coding scheme which is practical 
to instrument. A digital- scan camera is required which effectively produces a variable rate source 
and thereby reduces buffer requirements. The camera scans a line at a time at an element-per- 
second rate equal to the bit transmission rate. The predictor also operates at this rate and stores 
the value of each unpredictable element in the Q store. At the same time, the T-register is loaded 
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Figure 7-2— Functional diagram 
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with a sequence of prediction success indicator digits which represent the position of each unpre- 
dictable (or predictable) element in the line. This sequence is fed directly to the R- register for 
transmission to the channel. At the completion of the T readout, the camera scan is stopped at the 
end of the given line while the Q store is fed to the R-register for transmission. The readout con- 
trol is an electronic commutator which formats the output sequence. 

The efficiency of the proposed coding scheme is indicated in Table 7-1 for ten representa- 
tive photographs. The performance of the best run- length coding scheme is also presented in this 
table. It is expected that certain portions of the proposed system will be fabricated and optimum 
synchronization methods will be studied. 
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Table 7-1 -Comparison of Run- Length Encoding and Position Encoding 
(Data from computer analysis of zero-order hold predictor 
operating on 10 Tiros pictures) 


Picture 

Bit Compression Ratio 

Bits per Element 

Run- Length 
Encoding 

Position 

Encoding 

Rim- Length 
Encoding 

Position 

Encoding 

1 

2.473 

2.602 

2.097 

1.922 

2 

2.237 

2.469 

2.235 


3 

2.105 

2.381 

2.376 


4 

2.263 

2.481 

2.209 


5 

1.762 

2.129 

2.838 

2.349 

6 

1.714 

2.092 

2.916 


7 

2.127 

2.349 

2.351 

2.128 

8 

1.216 

1.568 

4.112 

3.188 

9 

1.943 

2.267 

2.573 


10 

4.090 

3.276 

1.222 

1.526 

Average 

2.006 

2.288 

2.493 

2.185’ 
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8. A PARAMETER EXTRACTION TECHNIQUE 

J. W. Snively, Jr. 

Goddard Space Flight Center 
Greenbelt, Maryland 


INTRODUCTION 

A so-called "parameter extraction technique” which is applicable to data displayed in histo- 
gram form is presented herein. The method is called a parameter extraction technique rather 
than a data compression technique because not all of the information contained in the original his- 
togram is preserved. To paraphrase a remark made by Dr. Balakrishnan, the parameters sent 
are tailored to the ultimate use of the data. 

The quantiler technique described in a preceding paper of this symposium* is another ex- 
ample of a parameter extraction technique. However, the technique discussed in the present 
paper is more algebraic than the statistical approach given in the preceding paper since calcu- 
lated upper and lower bounds on the magnitude of the desired quantities are used rather than 
confidence levels or hypothesis tests. 

The technique described in this paper has been implemented into flight hardware as part of 
the GSFC/University of Maryland plasma experiment to be flown on Interplanetary Monitoring 
Platform (IMP) flights F and G (Reference 1). The IMP’S are small spin- stabilized spacecraft 
launched into highly elliptical Earth orbits. In this experiment, the satellite spin is divided into 
16 equal sectors and the particles arriving at the sensor in each sector are counted; thus a histo- 
gram is produced. Figure 8-1 shows a pair of typical histograms. Histograms which resemble 
a very sharp normal distribution are expected 
from interplanetary space outside the magnet- 
osphere, whereas in the transition region just 
inside the shock wave that surrounds the mag- 
netosphere, a relatively flat curve is expected. 

The goals of this experiment were to map the 
boundaries of the transition region and the 
magnetosphere and at the same time to study 
relatively fast changes in the plasma intensity. 

These goals could not be achieved within the 
bandwidth allotted to the experiment. 


In IMP flights F and G (this constitutes 
this parameter extraction technique) the area 
of the histogram and the sum of squares of the 
histogram bars are to be calculated. The 
present paper elaborates on this technique and 
shows how well certain properties of the orig- 
inal histogram can be recovered from these 
parameters.** measuring experiment 
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Figure 8-1— Typical histograms expected in a plasma- 



•Paper 4, pp. 55 to 73. 

••Statements made throughout this paper are supported by mathematical proofs in the Appendix of Reference 2. 
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TECHNIQUE 


Relationship between Area and Sum of Squares for a Histogram 


Assume that the area A of a histogram is known. The sum of the squares of the histogram 
bars £ must lie between two numbers which differ by a factor equal to the number of bars n in 
the histogram. The larger of these two limits for the sum of the squares corresponds to a histo- 
gram where all but one of the bars contain no counts. Therefore, this larger limit for the sum of 
the squares is simply the square of the area, A 2 . The smaller limit for the sum of the squares 
corresponds to a histogram where all of the bars have equal numbers of counts. In this case, the 
lower limit turns out to be A 2 /n . 



Figure 8-2-Relationship between area and sum of 
squares for a histogram with 16 bars 


These facts are illustrated in Figure 8-2 
which shows the relationship between the area 
and the sum of the squares for histograms of 16 
bars. Histograms having 16 bars (with area less 
than 2 19 counts) lie within the cross-hatched 
region between the two parallel lines. For ex- 
ample, a 16-bar histogram with an area of 2^ 
must have its sum of squares between 2 12 and 
2 8 . This absolute constraint is the basic fact 
taken advantage of in the telemetry technique 
we wish to describe. 

Hardware Use of This Relationship 

If the quantity A is known, then, accord- 
ing to the preceding section, the sum of squares 
I is known to within a factor of n. In other 
words, the location of the most significant bit 
of £ is known to within log 2 n bit positions. 
Specifically, for a 16-bar histogram the most 
significant bit must lie within one of four posi- 
tions. If for this 16-bar histogram A is 2 6 
counts, then the most significant bit of I is 
either the 9th, 10th, 11th or 12th bit of the 
word. The telemetry technique being expounded 
is simply the transmission of A and log 2 n bits 
of 2 . More bits of I can be transmitted if 
more accuracy is desired. Figure 8-3 is a 
block diagram of the equipment designed to 
implement this technique for the IMP plasma 
experiment. 

The area counter at the bottom of the dia- 
gram commutates four bits of the sum of the 
squares counter to the telemetry system. Note 
that in this application a logarithmic counter 
(Reference 3) is used for the area determina- 
tion. Therefore, although the total area of the 
collected histogram can be as large as 2 19 
counts, only eight bits are required to 
represent this number to a ±3 percent 
accuracy. 
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Transmission of the 12 indicated bits 
allows one to determine the sum of the 
squares to, at worst, ±33 percent. This 
greatest error occurs when the four commu- 
tated bits of the sum of the squares counter 
are 0001. When these four bits are 1111 the 
worst error is ±3.3 percent. 

The remainder of this paper is theo- 
retical. It assumes that the area and the sum 
of the squares are known exactly and conse- 
quences of such conditions will be discussed. 
The consequences obtained may be applied to 
cases where the area and the sum of the 
squares are not known exactly. 


INFORMATION CONTENT OF THE AREA 
AND THE SUM OF SQUARES 

Definition of Ratio r 



SUM OF 
SQUARES 
OUTPUT 


AREA 

OUTPUT 


TO 

TELE- 

METRY 


Figure 8-3— Block diagram of processing equipment 


A quantity denoted by the symbol r which we shall call the "ratio" is defined as 


r 



(i) 


Here n denotes the number of bars in the histogram and C. denotes the number of counts in the 
i th largest bar. This parameter is related to the mean fi and the variance o 2 by the equation 



(2) 


but has the advantage of assuming only values between 1 and n . For this reason, it is a useful 
parameter for describing various properties of histograms. 


Peak Height 

The value of the largest bar of a histogram, expressed as a fraction of the area of the histo- 
gram, cannot be greater than the area A nor smaller than an n th of the area A/n, where n is the 
number of bars in the histogram. If the largest bar has the value A then all other bars must have 
the value zero. If this largest bar has the value A/n all others must have this same value. These 
two cases correspond to ratios of n and of 1, respectively. These are the extreme cases. 

For any given ratio, the largest bar of a histogram can assume only values in a subinterval 
of the interval between A/n and A. For a given ratio r the greatest possible value for the largest 
histogram bar is given by 


A (l +/(n - 1) (r - I) ). 


( 3 ) 
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This expression reduces to A when r = n and to A/n when r = 1. Therefore, the expression for 
the largest histogram bar agrees with the known histograms for the extreme values of r . 

If the largest bar of a histogram with ratio r is equal to the value given by Expression 3, 
that histogram in this special case is completely determined. In fact, all of the remaining n - 1 
bars are equal. Therefore, these bars must be equal to the value of the expression 



which is merely the area not included in the large bar divided by n - 1. Figure 8-4 shows several 
5-bar histograms. Each histogram has the greatest largest bar possible for the indicated r. Note 
that the largest bar decreases as r decreases and the base of the histogram increases as r 
decreases. 

The ratio r of any histogram must lie in the interval between 1 and n. Divide this interval 
into n - 1 subintervals (or s -intervals) and label these intervals from 2 to n as shown in Figure 
8-5. Any subinterval S contains ratios between n /s and n/(s - 1). This integer S is the fewest 
number of nonzero bars possible for histograms with ratios in the S -interval. For example, if 
r = 1 then s = n since 1 lies in the interval bounded by 1 and n/n - 1. Therefore, n bars must 
have nonzero values for a ratio of 1. Another example is if n = 16 and r = 5 then, since 5 lies 
between 16/4 and 16/3, S = 4. Therefore this 16-bar histogram must have at least 4 nonzero bars. 
There is no restriction as to how many nonzero bars a histogram may have. 



S- INTERVAL 


S- INTERVAL 

I 5 I 4 I 3 I 
1 n/n - 1 I n/4 I n/2 

n/5 n/3 


./ 


RATIO, r 


S- INTERVAL 
2 


X 

n 


Figure 8-4-Histograms (n = 5) with the largest peak 
height for the indicated ratios 


Figure 8-5— A segment of the real number line 
illustrating the definition of s-interva!s 


The smallest possible value for the largest bar is 


A. 

S 



rS - n 
n(S - 1) 


(5) 


When r = n we have s = 2, and the expression reduces to A. When r = 1 we have S = n and the 
expression reduces to A/n. Therefore, this expression agrees with the known histograms for the 
extreme values of r . 


If the largest bar of a histogram with ratio r is equal to the value given by Expression 5, 
that histogram in this special case is completely determined. It is a histogram with s bars, s - 1 
of which are equal to the value of Expression 5. The remaining bar has the value 

(S - 1) (r S - n) | 
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Figure 8-6 shows several 5-bar histograms. 
Each histogram has the smallest largest bar 
possible for the indicated r . Note that the 
largest bars decrease as r decreases and 
that the smaller bar increases as r de- 
creases until it equals the largest bars. Then 
as r continues to decrease a new bar in- 
creases from zero until it equals the larger 
bars again. This process continues until we 
have n equal bars. 

Figure 8-7 illustrates, for 16-bar histo- 
grams, how the bounds on the largest histo- 
gram bars must lie within the crosshatched 
region between the two bounds. 

Figure 8-8 is a plot for 16-bar histo- 
grams of the largest possible plus-or-minus 
percentage error in determining the largest 
histogram bar as a function of the ratio. Note 
that for the sharp peaked curves the percent- 
age error is quite small. The maximum per- 
centage error of ±43 percent occurs for a 
histogram with a ratio of slightly less than 2. 
This corresponds to a flat curve where the 
peak height is of lesser significance than it 
is for the very peaked curves of the higher 
ratios. 


Amplitude Distribution 


Another quantity which can be deduced 
from a knowledge of the area and the sum of 
the squares of a histogram is amplitude dis- 
tribution. In fact, one can easily obtain bounds 
for each bar of a histogram that are similar 
to those for the largest one. Let C r refer to 
the largest bar of a histogram, C 2 to the 
second largest bar, and so forth, so that C n 
refers to the n th largest bar of the histogram. 
In this notation for the case of a histogram 
with n bars, C a will be the smallest bar of 
the histogram. 

The largest possible value for the p th 
largest bar of a histogram with area A and 
ratio r must be computed in one of two ways 
depending on the value of the ratio. If the 
ratio is between n/p and n ? the largest pos- 
sible value for the p th largest bar is given by 




Figure 8-6-Histograms (n= s) with the smallest peak 
height for indicated ratios 



RATIO, r 


Figure 8-7-Bounds for the largest histogram bar (n = 16) 



Figure 8-8-Percentage error in the largest histogram 
bar vs. ratio 
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If the ratio is between 1 and n/p , the largest possible value for the p th largest bar is given by 



If p =1, the entire range of ratios is covered by Expression 8, which reduces to Expression 
3 when 1 is substituted for p . 


When the ratio is equal to n/p (except for p = 1) both Expressions 7 and 8 give identical re- 
sults, namely, that the largest possible value for the p th largest histogram bar is exactly A/p . 
This value is the largest possible value that the p th largest bar of a histogram can have for any 
ratio. It corresponds to a histogram with exactly p , equal, nonzero bars. 


If the ratio is between 1 and n/(p- 1), the smallest possible value for the p th largest histo- 
gram bar (p ;> 2) is 


A L /(p - 1) (r - 1)\ 

n \ V n -p + 1 ) 


( 9 ) 


When the ratio is 1, Expression 9 reduces to A/n. This corresponds to the known extreme histo- 
gram for this ratio, namely the one with n equal bars. When the ratio is n (p - 1), Expression 9 
reduces to zero. 

Figure 8-9 illustrates how the bounds on some of the histogram bars vary with ratio for the 
case where n = 16. Note that bounds on the largest bar were already illustrated in Figure 8-7. 



(a) 




(b) 


Figure 8-9— Bounds for (a) the second largest histogram 
bar, C 2 , (n= 16), (b) the third largest histogram bar, C 3 , 
(n = 16), and (c) the p th largest histogram bar C p , 
(n = 16). 


RATIO, r 
(c) 


16 
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For bars other than the largest, the least upper bound for the p th largest bar increases from A/n 
when r = 1 to A/p when r = n/p, and then decreases to zero when r = n. The greatest lower bound 
decreases from A/n when r = 1 to zero when r = n/(p - 1). The two extreme cases r = 1 and r = n 
are the only completely determined ones. 

Figure 8-10 illustrates the consequences of these results for the case where n = 16. For 
ratios with integer values, the bounds for each histogram bar are drawn with the largest bar on 
the left and the smallest bar on the right. In this form one can see at a glance how the shape of 
the resulting histograms varies as the ratio changes. Note that histograms with larger ratios are 
much steeper and narrower than those with lower ratios. 


FURTHER HARDWARE CONSIDERATIONS 

In the preceding section the information content of the area and sum of squares of a histogram 
was discussed. The discussion was based on these quantities’ being known exactly. In practice, 
however, as in the IMP plasma experiment, these quantities will not be known exactly, but for a 
given set of bits these quantities will be known to lie between certain known bounds. For example, 
suppose the area A is known to lie between A max and A min and the sum of squares 2 is known to 
lie between l max and 2 min . Then the ratio r "must satisfy the bounds 

n£ „£ 

min < r < ma> (10) 

A 2 = A . 2 

max min 

Similarly, bounds for any histogram bar could be found by choosing the largest upper bound and the 
smallest lower bound from among those for all ratios in the range of Expression 10. 

When one uses the flight hardware for the IMP plasma experiment designed for a maximum 
number of counts in any histogram bar of 2 17 and a maximum total area over 16 bars of 2 19 counts, 
the ratio r is always determined to better than ±40.0 percent of its range, but on the average it is 
determined to about ±12.0 percent. 

Table 8-1 is a segment of a computer program output which relates the output of the IMP 
flight hardware to the input data which produced the specified output. For example, if the log 
counter is 189 (275 octal), the area of the input histogram lies between 29,696 counts and 30,719 
counts. If, furthermore, the squarer output is 12, the ratio of the input histogram must be between 
13.64 and 15.85 and the largest bar of this input histogram is between 27,312 counts and 30,571 
counts. For each of these quantities the harmonic mean (H. M.) of the range and the maximum plus 
or minus percentage error (P.E.) are also listed. 


CONCLUDING REMARKS 

The present paper has shown how transmission of the area of a histogram and certain bits 
of the sum of the squares enables one to recover much of the original histogram. It has shown 
how knowledge of the area of the histogram imples a restriction on the sum of the squares of the 
histogram. This fact is the backbone of the entire process for it enables the sum of the squares 
to be transmitted with a smaller number of bits if the area is also transmitted. This paper has 
also shown how knowledge of both the area and the sum of the squares of a histogram implies the 
restrictions on each of the bars of the histogram. 

As spacecraft travel farther away from Earth, the need for onboard processing of data will 
increase. The IMP plasma experiment is an example of a situation where onboard processing is 
necessary so that the desired goals can be accomplished. The computation discussed in this paper 


123 



AREA 






AREA, A (arbitrary units) AREA, A (arbitrary units ) AREA, A (arbitrary units ) AREA, A (arbitrary units ) 








189 (275 ) 29696*00 30 719*00 30 198 * 64 
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has been able to increase the effective amount of information that can be transmitted to Earth from 
this experiment by more than an order of magnitude. 
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9. USE OF A STORED-PROGRAM COMPUTER ON A SMALL SPACECRAFT 


R. A. Cliff 

Goddard Space Flight Center 
Greenbelt, Maryland 


SMALL SCIENTIFIC SPACECRAFT 


N67-27411 


The primary purpose of a scientific spacecraft is to provide a platform from which observa- 
tions and measurements can be made. The ideal scientific spacecraft would minimize the hardware 
requirements imposed on each experiment package. The majority of scientific spacecraft are small, 
weighing perhaps 150 pounds, and their attitude is spin stabilized about the axis of the highest iner- 
tial moment. Since many of the experiment sensors are directional, the spinning spacecraft causes 
them to perform a circular scan, which is frequently desirable. The spinning does pose a problem, 
however, in that the experimental data are most meaningful when collected in synchronism with the 
spacecraft spin, whereas the time- division- multiplex telemetry system commutates the experi- 
ments at a fixed rate that is not related to the spin rate. 

Another problem is that the raw data from the many experiments that measure random events 
(energetic particle experiments, for instance) require a wide bandwidth but a low data content. Raw 
data from other experiments may not require as much bandwidth, but the average data rate may be 
higher. In either case, the raw data would impose a severe burden on the telemetry transmitter if 
all of it were to be transmitted. For this reason, devices have been included on board scientific 
spacecraft to accomplish buffering and simple data compression such as logarithmic counting of 
particle events. 


Traditionally, the processing of raw data on board spacecraft has been accomplished by 
special-purpose equipment within each individual experiment package. Unfortunately, as the com- 
putation complexity and sophistication has increased it has become increasingly difficult for the 
experimenter to successfully develop an experiment system before it is obsolete. In addition, 
much duplication of capability occurs throughout the spacecraft. This is wasteful of power, weight, 
and space which a small spacecraft cannot afford. 

In order to improve this situation, many advantages would result from making a small, pro- 
grammable, general-purpose digital computer a part of the spacecraft electronics. The computer 
would do experimental data computations, data compression, and buffering. The experiment pack- 
ages would then need to consist of sensors and signal conditioners only. Not only should the over- 
all performance of the spacecraft be improved, but the time and manpower necessary to construct 
and check out an experiment would also be reduced. 


SPACECRAFT DATA SYSTEMS 

Early Interplanetary Monitoring Platform (IMP) Spacecraft 


The data system as found, for instance, on the IMP series of spacecraft should be examined 
briefly. Figure 9-1 is an idealized view of a data system which was used on the early IMP space- 
craft (IMP'S A, B, and C). It should be noticed that the experiments are connected to the trans- 
mitter through a subsystem labeled "Telemetry Encoder." Data processing, if any, is performed 
in the experiments. (There is one exception; the telemetry encoder contains accumulators, not 
shown, which count pulse data from certain experiments. This is a rudimentary form of the data 
processing.) 
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EXPERIMENT 


TELEMETRY ENCODER 



Figure 9-1 -Early IMP data system 


EXPERIMENT 


TELEMETRY ENCODER 



Figure 9-2-Present IMP data system 
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Inside the telemetry encoder are a number of PFM oscillators (Reference 1), one for each 
experiment. Each oscillator encodes the output data from its associated experiment as a fre- 
quency, and these frequencies are sequentially switched to the transmitter, one at a time, by the 
commutator. Synchronization pulses for the experiments are also supplied by the commutator. 

For experiments which have analog outputs, the PFM oscillator is voltage-controlled. For 
experiments with digital outputs, a digitally controlled oscillator is used that produces a discrete 
frequency for each input state. In either case, the frequency output phase is uncontrolled and the 
frequency is only determined approximately. This sort of system is not optimum for a number of 
reasons. 

Present IMP’S 

Performance of the system can be improved by using a phase-coherent digital PFM oscil- 
lator for all experiments. Such an oscillator is available (Reference 2) and will be used on IMP’S 
D, E, F, and G. The new, more complete oscillator replaces the many oscillators of the past by 
a single unit. Data are commutated to the single oscillator which then feeds the transmitter 
directly. 

There are disadvantages to this new method, however, for if the single oscillator fails all 
the data are lost, but the advantages far outweigh the disadvantages in that the new oscillator can 
be made more reliable than the old oscillators and fewer components are required. That the per- 
formance of the telemetry system using the new oscillator closely approaches the theoretical 
limits of the PFM oscillator is another advantage. 

Figure 9-2 shows the data system used on present IMP spacecraft (IMP’S D, E, F, and G). 
Except that the PFM oscillator and commutator functions have been interchanged within the telem- 
etry encoders, it is like the data system used on the early IMP’S. The PFM oscillator has been 
alternatively designated "channel encoder" to emphasize that, although a PFM oscillator is used, 
any type of channel encoder could be substituted (for instance, the pseudonoise PCM type). (The 
term "channel encoder" is used by communications specialists to describe a device used to encode 
data expeditiously in order to combat noise in the communication channel.) 

Everything that the "telemetry encoder" does, exclusive of channel encoding, has been con- 
solidated into the box called "commutator" in Figure 9-2. Besides commutation, analog- to- digital 
conversion (necessary now that analog voltage- controlled oscillators are no longer used), accumu- 
lation for pulse data, and the generation of synchronization pulses are included. 


Future Spacecraft 


The next logical step in the improvement of the spacecraft data system is shown in Figure 
9-3. Again the block containing the commutator has been moved to the left. This time the mul- 
tiple data processors in the various experiments have been supplanted by a single box labeled 
"computer." As before, improved performance is to be obtained by consolidating many similar 
functions scattered throughout the spacecraft into a single more efficient function. Many advan- 
tages of the centralized computer have already been discussed. 

It is proposed that the basic spacecraft configuration shown in Figure 9-3 be used in future 
spacecraft. Certain refinements should be added, however. Figure 9-4 shows an improved space- 
craft data system using a centralized computer that would be suitable for spacecraft such as the 
proposed Omnibus IMP’S. This system has two separate encoders, each of which performs the 
same function as the encoder in Figure 9-3. The two encoders in Figure 9-4 are identical except 
for their sources of synchronization. 
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EXPERIMENT 



Figure 9-3-Data system with computer 



SYNC PULSES TO EXPERIMENTS 


Figure 9-4-Use of spin synchronized commutator 
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The lower encoder (labeled "clocked encoder") operates in the conventional manner and is 
controlled by a fixed clock. This encoder handles the data from the experiments with nondirec- 
tional sensors. It does the usual commutation, analog- to- digital conversion, accumulation, and 
synchronization pulse generation. Data from the clocked encoder go to the computer for redun- 
dancy removal, data reduction and analysis, and formatting before being sent on to the channel 
encoder. 

The significant feature which makes the spacecraft data system configuration of Figure 9-4 
superior to that of Figure 9-3 is the upper encoder. It is synchronized to the spacecraft spin rate 
by Sun pulses supplied from the optical aspect system. For each revolution of the spacecraft, the 
spin- synchronized encoder goes once through its format. The advantages for experiments with 
directional sensors are greater because the entire operation of such an experiment is now syn- 
chronized with the spacecraft spin by pulses from the encoder, the experiment need contain no 
special provisions to collect certain data during particular portions of a spin. Also, data can be 
collected on every spin, instead of there being a collection of data during one spin and then a wait 
while the telemetry system reads out the data. Furthermore, the computer does the buffering be- 
tween the variable spin rate and the fixed telemetry rate. 

Additional possibilities exist in the ability to synchronize the operation of any (or all) of the 
directional experiments to any particular direction (to the Earth, Moon, or Sun, for example). It 
would also be possible for the spin-synchronized encoder to complete one format for any given 
integral number of spins. For example, if there were five directional experiments, each of which 
collected large amounts of data during a spin, it would be possible to allocate to each experiment 
a spin of its own during which it could utilize the full capability of the encoder and computer. In 
that case five spins would be required to complete a format. 

A few more comments about Figure 9-4 are in order. If the directional stimulus (in this case 
the sun) that controls the spin- synchronized encoder is lost, as, for instance, by an eclipse, then 
the encoder should free-run at the rate it had before the loss of stimulus. Techniques for accom- 
plishing the required synchronization characteristics have already been developed for the IMP 
spacecraft series (Reference 3). 

Another improvement is that a master clock controls all subsystems that are not spin- 
synchronized. This is particularly desirable for the transmitter carrier and the channel encoder 
(andforthe computer also) because a carrier-coherent telemetry system has synchronization charac- 
teristics superior to a noncoherent system (Reference 4). It is also important that the clocked 
encoder and the channel encoder be controlled by a common clock because if they are not the com- 
puter must perform an additional buffering function unnecessarily. 
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10. UNIFIED INFORMATION PROCESSING TELEMETRY SYSTEM 


C. J. Creveling 
Goddard Space Flight Center 
Greenbelt, Maryland 


N67 : 274l2 


Dr. Robert H. Goddard said, "It is difficult to say what is impossible, for the dreams of yes- 
terday are the hope of today and the reality of tomorrow." The system to be described in this 
paper shows potentialities for approaching the "dream system" discussed by Cliff in a preceding paper 
of this symposium* more closely than might have been suspected possible just a short time 
ago. Let us first examine a few statistics**. There were 18 satellite launches in 1966 and 
more than 60 are planned between now and 1970, including backups. Second, it should be noted that 
one of the earliest satellites, the Vanguard, weighed something on the order of 10 pounds while the 
OGO satellite weighs over 1,000 pounds. There is a spread of two orders of magnitude here. Third, 
bit rates of the earliest Goddard satellites (and even some of the present ones) run as low as 20 bits 
per second and as high as 128,000 bits per second, or four orders of magnitude higher. The error 
rate for the sample taken by these satellites at the design goals initially specified runs from 10 * 2 
to 10 " 8 , or six orders of magnitude. It is important that some of these facts be kept in mind when 
adaptive systems are discussed because no present system can cope with these ranges of variation. 


A unified information processing telemetry system will consist, basically, of a programmable 
computer onboard the spacecraft and an adaptive telemeter in which coding, power, bit rate, and 
format are controlled. It will include a programmable ground system which will have at least a 
capability of "talking back" to the satellite. The writer believes the term "adaptive telemetry" has 
its widest meaning in this sense. A multidisciplinary approach must be considered in order to 
achieve any solutions to the problems which cover such a gamut. One cannot take the position of a 
telemetry engineer who considers his system as a common carrier and deals with its inputs and 
outputs according to some specification. 

Figure 10-1 depicts a system for a simple laboratory experiment that requires one system 
engineer - the experimenter. Figure 10-2 shows what happens when the output of a sensor is telem- 
etered from some distance away because of safety considerations or the impossibility of having the 
experimenter present. Here a second man comes into the picture - the subsystem engineer. 


SENSOR TELEMETER INSTRUMENTATION DISPLAY 



SYSTEM ENGINEER: EXPERIMENTER 

SYSTEM ENGINEER: EXPERIMENTER SUB SYSTEM ENG. : TELEMETRY INSTRUMENTATION 


Figure 10-1 -An experiment in the laboratory 


Figure 10-2-An experiment on the range 


•Paper 9, pp. 129 to 133. 

••From "Major Space Projects at Goddard,” internal GSFC document (Official Use Only). 
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Figure 10-3 shows an increasing order of system complexity that requires that an increas- 
ing number of people become involved in the design. Now the problem can be viewed as a whole 
and information systems can be discussed. Although data handling inside the subsystems is of 
concern, the meaning of the data as it passes through the system as a whole is of equal concern. 
This best illustrates what the term "information" means when an information system is discussed. 
The overall system has a number of subsystems and it would be very difficult to present in detail 
adequate specifications of the inputs, outputs, and interfaces between subsystems. Until each sub- 
system engineer concerns himself with the subsystems which adjoin his, and indeed with the whole 
satellite plan, adequate control will be lacking. For example, consider an experimenter looking at 
the pulsed output of a sensor in a simple experiment. Initially, the experimenter would like to see 
not only the pulse but the wiggles on the pulse to establish in his mind the integrity of the instru- 
mentation. Only after he has been convinced that his experiment is indeed working properly is he 
interested in the pulse value. If the system gives him this value and nothing else, he will still 
question the integrity of the experiment. If the experimenter is asked what he would like to have 
in a telemetry system, he might say, "Give me 5 megacycles!" It is believed that it is possible, 
with the use of onboard processing and an adaptive system, to satisfy both of these requirements 
in the sense that they can be controlled from the ground. To do this an initial learning period is 
provided in which any or all of the experiments can be sampled successively at very high rates in 
order to establish confidence in the system, and then only the desired information is transmitted. 


PROCESSOR 

SENSORS 



SPACECRAFT 

SUBSYSTEM 


DATA 

REDUCTION 

TELEMETER ANALYSIS 



TELEMETRY GROUND 

SUBSYSTEM SUBSYSTEM 


PROJECT MANAGER 
PROJECT SCIENTIST 
TRACKING SCIENTIST 
DATA PROCESSING ENGINEER 
TELEMETRY ENGINEER 
ETC. 


Figure 1 0-3— Subsystems used 


in coupled experiment 


A great deal has been said about the problems of coding as a technique used in various com- 
munications systems. Many of these codes, however, have been generated to satisfy specific prob- 
lems but our concern here is with the error rates covering six orders of magnitude. It is very 
difficult to imagine any single set of codes that would satisfy the requirements over this range; 
however, once there is a computer on board the spacecraft and we have the means for controlling 
it, the computer becomes a very useful adjunct to coding the telemetry link. Therefore, the com- 
puter belongs as much to the telemetry engineer as it does to the onboard processing system. 

A system is being proposed by the GSFC Information Processing Division over which they 
have to exercise a measure of unified control in its design. Unified control means that all of the 
subsystems will be under a single administration, which will provide a fairly close control over 
both the administrative and technical problems involved in developing all the various subsystems, 
from the spacecraft to the ground support. However, when it is necessary to become involved in 
an interdisciplinary approach, the semantic problem immediately arises of communication between 
people who at times use the same terms with different meanings and who use unfamiliar terms. It 
is difficult, for example, for a person who is not used to working with computers to understand what 
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goes on inside the computer. A thorough understanding of a computer program can be obtained 
neither by looking at the punch cards nor by looking at the sequence of the machine instruction 
codes. In fact, it is doubtful that a complete understanding can be obtained by reading the program 
in some language such as Fortran. To get this understanding, not only must the program be ex- 
pressed in some relatively higher order language but also an English description of it and a flow 
diagram are necessary. A number of so-called higher order languages have been developed for 
computer programmers in place of, or to supplement, flow charts and the English descriptions 
which can describe very complicated systems very precisely. For example, the logic structure 
of the IBM system 360 has been completely specified in a higher order language in approximately 
a dozen pages. The programming that takes place on the IMP-F satellite, which is a relatively 
small system, was described by Dr. E. P. Stabler* in this higher order language in a few sentences. 

It is presently proposed to study and use such a higher order language for several reasons. 
First, it is most convenient for conveying information from one subsystem designer to another in 
a complete and unmistakeable manner; second, this language lends itself to the writing of the actual 
programs which will be used in the computer on board the satellite and in the computers on the 
ground (although they are different types); and, third, this language becomes a design tool in the 
design of the onboard computer because it is possible to write the Boolean expressions for the 
functions expressed in higher order language (in fact, this has already been started). From these 
expressions the logic diagrams can be made through a relatively straightforward process. 

It is not clear just how useful this language will be to people outside the information system. 

It would be naive to think that, because this wonderful thing has been developed and discovered, 
everybody will say, TT Teach it to me.” What system engineers have to do first is use it for their 
own benefit. However, these languages have been in vogue and widely distributed now for several 
years and are finding some use by programmers in general. Moreover, almost all satellite ex- 
perimenters have been forced to become computer-oriented. This provokes the increasing use of 
these languages as a natural trend. 

The kind of machine that is being proposed here for spacecraft computers follows work that 
was done by Dr. E. P. Stabler at GSFC last summer. At that time it was undertaken to design a 
stored program computer that would perform the same functions as one of the IMP spacecraft 
processors, and it was concluded that one could be designed with approximately the same size, 
weight, and power consumption that would be busy less than 10 percent of the time. Therefore, it 
should be possible to build a computer for small satellites that would have a considerable amount 
of time left for things such as coding the telemetry link, changing the format of the data passing 
through the system, and possibly processing some of the information before it goes into the telem- 
etry link. If there is a computer on board the satellite, a computer on the ground, and a command 
system, a "dream system" starts to take shape - a dream system in which a spacecraft ultimately 
might be considered a piece of peripheral gear used in conjunction with a large ground-based com- 
puting system. This spacecraft computer would then perform experiment conditioning, multiplex 
processing, and coding for the telemetry link. In addition, the modularity in design necessary to 
overcome problems in reliability would enable the configuration to be changed from one mission 
to another. 

Figure 10-4 shows in a very simple manner a microprogrammed machine and a stored pro- 
gram computer. These differ as shown in Table 10-1. The conventional computer is character- 
ized by fixed input format and fixed word length. On the other hand, in the microprogrammed 
machine variable word length can be provided very inexpensively. The conventional computer has 
a fixed logic structure, a stored program, and a fixed instruction repertoire that can be set up on 
the ground. The microprogrammed computer differs in this respect in that the microprograms 
as well as the macroprograms are stored and are changeable. In the conventional computer, flex- 
ibility is provided by software and the capability for simultaneous operations is limited. In general, 
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Figure 10-4-Computer configuration 


Table 10-1 - Computer Characteristics 


Conventional Computer 

Microprogrammed Computer 

Fixed input format; fixed word length 

Variable input format 

Fixed logic structure; stored program 
and fixed instruction repertoir 
masking for bit manipulation 

Flexible internal structure 

Large stored program memory 

Small stored program 

Low speed (many instructions per 
macro) 

High speed 

Flexibility provided by software 
Limited simultaneity 



a large stored program memory or else a very extensive list of instructions is required and it is 
also low-speed in the sense that many machine instructions have to be carried out for each micro- 
program or so-called macroinstruction. The microprogrammed computer, however, circumvents 
many of these limitations by being, in effect, almost any machine that you want it to be, and its 
instructions, in effect, rewire the machine for the problem at hand. The number of instructions 
to be stored in order to do this is surprisingly modest and the speed is very good. 

Development of this system is now progressing in a series of steps. The study phase is in 
process and a design phase has been started in which the tentative logic designs are being carried 
out. A breadboard is planned for testing the actual onboard subsystem. It is not necessary to 
build a computer on the ground because there are a variety of these available and they can be pro- 
grammed according to higher order language. In order to see how this system will operate as a 
whole, a computer simulation of the whole system has been planned. This is not going to be an 
easy job but it can be accomplished in parallel with the development and later on it will be inval- 
uable in reconfiguring the system according to what is learned from the simulation. Having a 
flexible onboard computer permits the feedback of simulation results and early experiences with 
the system. This will be followed by the development of a prototype and a flight model in the future. 
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SUPPLEMENT QUESTION AND ANSWER SESSION 


UNIDENTIFIER QUESTIONER: I have two things, one is in the nature of a comment. Although 
the computer is a very useful tool, I think you have to be careful not to use it for things that it is not 
emminently suited for. For instance, you mentioned perhaps doing a channel coding in the computer, 
which in certain instances could be a good idea and certainly provides lots of flexibility. However, 

I think this is one thing that is quite efficiently done by fairly elegant hard-wired devices which 
have been developed such as the PN system. To go another step further, if you are counting ri a l a 
pulses from an experiment- interrupt computer and have to add "one" to a register, you could in- 
clude special accumulators to do this and then transfer the results entirely. You have to be some- 
what careful what jobs you do assign to a computer. 

The other thing I have is a question about the microprogrammed machine. Do you have any 
feeling for the difference in complexity between the standard configuration and microprogrammed 
configuration, especially considering this is a box-labeled switching matrix and may grow to fairly 
considerable proportion? 

MR. CREVELING: I think your remarks are certainly well taken and reflect my feeling on 
the subject of the software approach versus the hardware approach to specific problems within the 
spacecraft, such as coding the telemetry link. I mentioned the fact that the computer is available 
and that the computer is a very flexible device so that as you change from mission to mission, or 
if you change from very close ranges to very distant ranges as in some of our eccentric satellites, 
you have the ability then to change your type of coding. It is certainly true that if you have a fixed 
coding scheme you could build up hardware devices to do this very efficiently and very quickly. In 
such cases, we have generally followed the practice of investigating both approaches, trying it both 
ways (at least in the study phase) to decide which is the better, and making the choice on that basis, 
taking into account the fact that sometimes expediency calls your hand. I feel that the reason that 
the computer has advantages for doing this is not because of greater efficiency but because of the 
flexibility of the approach. 

Regarding your question as to the difference between the microprogrammed machine and the 
more conventional machine, we plan to try both of those, and compare them. At the moment I could 
not give you a comparison because we have not gotten that far in our study, but I think that the micro- 
program machine will compare favorably with the conventional approach. 

MR HABIB: In your discussion here a thought occurred to me. I wonder whether we did not 
miss the point (semantics again) dealing with the word "computer" when we are really talking about 
general-purpose data or signal-handling system. You began your talk by talking about unified in- 
formation systems and the designer needed in a sense a control over all of the elements in it. Then 
we very suddenly get down to a specific thing and call it a computer and I think what flashes into 
everybody’s mind is the type of computer that sits on our floors out here. I think you both were 
correct that you will use unique equipment words where needed and use general-purpose words 
where needed and so long as you are allowed to design the entire system then you will achieve your 
goals. 


MR CREVELING: No argument. 

DR. POSNER: If you were to use your computer to encode the telemetry, I envision that a 
computer failure in the central computer clock would cut off all information as to what happened 
and, in fact, by having all experiments channeled through one machine you may find out that it be- 
comes a no-go instead of an OGO spacecraft. In this case you may be very unhappy at having 
everything channeled through one point. We at JPL feel that decentralization will protect a mis- 
sion. Now our missions may be more expensive and you may have to count on getting some use 
out of them. On a cheaper mission it may be that the reliability is not as important. These are 
sort of political decisions but in the very expensive heavily equipped spacecraft for soft landing 
on Mars you have to count on getting some experiments back no matter what fails, so, in a really 


139 



expensive mission where you have everything staked on it, I would hestiate to think of channeling 
everything through one computer. On a smaller interplanetary spacecraft where you have dozens 
of them in the warehouse you may find it more economical to use the central computer. Inciden- 
tally, a possible use for a computer once you have it is to decode the commands from the ground. 
You may want to have very highly encoded commands to avoid having a miscommand decoded by 
the spacecraft which might f wipe out T the mission. If you do have a general purpose computer, that 
would be a really good use for it. Perhaps using a computer to decode spacecraft commands may 
be a better thing to do than use it to encode telemetry. 

MR. CREVELING: That is one of those bad little nightmares that sometimes interrupts our 
dream system and we have thought of it. If you will notice in Mr. Purcell's diagram, he shows a 
switch by which his airborne computing system could, in effect, bypass the data from the commu- 
tator (which presumably would remain unimpaired) and would continue to send some kind of data 
down to the ground. There is another approach, several varieties of approach, to this reliability 
problem. They generally involve the use of some kind of equipment redundancy. One form which 
has been used in a large-scale computer is the one in which you have a number of memory cells 
and a number of computing units, and these two are connected by a matrix. If one or more of these 
fails, you limp along with something less than the capability of the original system. Since it is 
programmed, you now reprogram to handle less information. I hope that we can use some such 
technique to remove the fears that you talk about and, as to your suggestion on using the computer 
to decode, it is a very good idea. 


11. CHANNEL NOISE - A LIMITING FACTOR ON THE PERFORMANCE 
OF A CLASS OF ADAPTIVE TECHNIQUES 
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The present paper reports the results of an experimental program devoted 
to the determination of the effects of channel noise on a class of adaptive tech- 
niques. Solutions to certain problem areas are postulated and the net efficiency 
of the coding techniques is determined. 


The field of adaptive-compression telemetry techniques is relatively new 
compared with areas such as television and speech-bandwidth compression. 

One of the earlier papers on the subject, whose central theme was the design of 
an efficient telemetry system, appeared in 1959 (Reference 1). Since that time, 
effort has been concentrated in four major areas: Adaptive sampling, pre- 
processing, selective monitoring, and efficient encoding. 

From the available literature it appears that the majority of the users and 
investigators of telemetry compressions are considering the zero-order inter- 
polator. An unfortunate discrepancy exists in the terminology in that the zero- 
order interpolator is frequently described as a zero-order predictor (Refer- 
ence 2), self-adaptive compression (Reference 3), an adaptive sampling technique 
(Reference 4), selective monitoring (Reference 5), run-length encoding (Refer- 
ence 6), floating aperture (Reference 7), redundancy removal (Reference 8), and 
the step method (Reference 9) . 

These techniques, although sounding basically different, operate identically. 
Although they have been applied to a telemetry source, they can also be applied 
to a television source. For the purposes of illustration, the source will be con- 
sidered a television source. Start with word 1, line 1, of a television frame. 
Word 1 is compared with word 2; if the absolute value of the difference is less 
than K, word 2 is disregarded and the process continues until the difference is 
greater than K, say at word j. At this point, word 1 is transmitted to a buffer 
store along with the distance from word 1 to word j . Word j then replaces word 
1 and is used as a basis, and the process continues. It is this class of adaptive 
techniques that will be investigated. 
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PROBLEM DEFINITION 


Adaptive techniques for data transmission can be applied to various sources, 
including video, speech, and telemetry data. There are three possible ways to 
take advantage of the adaptive technique: (1) if the bandwidth is fixed, adaptive- 
compression techniques can save transmitter power; (2) if the transmitter power 
is fixed, bandwidth can be conserved; and (3) if both the transmitter power and 
bandwidth are fixed, more information can be transmitted per unit time. Assume, 
for example, that photographs are transmitted from an earth-orbiting vehicle 
with rf parameters as given in Table 11-1. With the channel capacity fixed at 
A = 10 6 bits per second and by employing data compression, more information 


Table 11-1 

COMMUNICATION-LINK PERFORMANCE 


Parameter 

Symbol 

Value 

Value 

Transmitter power 

p t 

2 watts 

3 db 

Transmitting antenna gain 

G t 

Omni 

0 db 

Receiving antenna gain 

G 

r 

60' Parabolic 

49.9 db 

Receiving system temperature 

T 

200° K 


Transmitting frequency 

f 

2300 MHz 


Receiving noise spectral density 

A 


-205.6 dbw/cps 

Range 

R 

5000 n mi 


Free space loss 

K 


179.0 db 

Miscellaneous spacecraft loss 

A 


3 db 

Design margin 

A 


6 db 

Bit rate 

Kb 

10 6 bps 

60 db 

Energy/bit to noise spectral density 

E/(N/B) 


10.5 db 

FM bandwidth 

B if 

700 kHz 

58.5 db 

Carrier-to-noise ratio 

C/N 


12 db 
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can be received per unit time. If the PCM television frame is composed of x 
samples per line and y lines per frame, each sample is quantized to n bits per 
sample, and F frames per second can be transmitted, the data rate is 

R = Fnxy bits/second. (1) 


The number of data frames that can be transmitted is 


F 


A 

frames. 

n xy 


( 2 ) 


The amount of information that can be transmitted in t seconds is I = Ft . 

If F can be increased, I increases, and a savings results. One way of in- 
creasing F is to decrease nxy by some form of data compression. 

ADAPTIVE SAMPLING TECHNIQUES 

The adaptive sampling technique that will be discussed is the zero-order 
predictor. Assume that the frame of data is composed of xy samples. If the 
coding algorithm is applied, the result is a sample reduction of 


C 


S 


x y 
s ’ 


(3) 


where S is the number of nonredundant samples per frame. In order to recon- 
struct the data at the ground stations, it is necessary to address each nonre- 
dundant sample. By assuming that the absolute magnitude and address of each 
sample in a line is transmitted, the gross average data compression is 


nxy 

log 2 (x) + n S 


(4) 


For example, consider n = 6 bits and x = y = 256. Then 

C g = 0.429 C s . (5) 

Hence, addressing the nonredundant samples results in approximately a 60 per- 
cent reduction in compression. 
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For the purposes of this analysis, it will be assumed that the incremental 
distance Ax between nonredundant samples will be transmitted, instead of the 
absolute address of the nonredundant samples; that is, 


Cg |_log 2 (Ax) + n 

For n = 6 and Ax = 16, C = 0.6 C . 

With this form of addressing, the sample reduction compression is only 40 per- 
cent rather than the 60 percent achieved with absolute addressing. 

CHANNEL NOISE 

Channel noise is another limiting factor on the efficiency of an adaptive 
sampling technique. Under the constraints of fixed transmitter power and band- 
width, the effects of channel noise on the zero-order predictor will be more 
pronounced than on PCM data transmission. Each transmitted segment now 
represents, on the average, C s elements. Therefore, a single bit error will 
affect many elements. It will be shown later that, depending upon the system 
implementation, a single error can destroy an entire frame of data, a single line 
of data, or just a few samples. 

For the zero- order predictor analyzed, it will be assumed that perfect line 
synchronization is available and that the first element of each line transmitted 
is correct. With these constraints, the maximum degradation that can be caused 
by a single bit error is the loss of an entire line of data; however, the error 
cannot propagate into the next line. The most logical solution to the problem is 
to increase the signal power (Reference 10); however, this is not permissible 
under the constraints of the stated problem. Therefore, once the channel char- 
acteristics are specified, it is necessary to add error protection to minimize 
error propagation either by means of error detection and correction or by error 
detection and retransmission. For the purposes of this paper it will be assumed 
that the former case will be employed, that the channel is binary symmetric, and 
that the type of error protection will be a single -error-correcting coding of the 
Hamming class. In this case, the net data compression is reduced further by the 
amount of redundancy the error-correcting code adds. In general, if P bits is 
the amount of redundancy added to each transmitted word, the net data compres- 
sion is given by 



C 


n 


n xy 

[l°g 2 (Ax) + n + P]S 


(?) 
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or 


C 


n 


log 2 (Ax) + n ' 
log 2 (Ax) + n + P_ 


(8) 


which is merely a restatement of the fact that the net compression is equal to 
the gross average data compression times the efficiency of the error-correcting 
code. Carrying through the previous example, but adding a (14 and 10) Hamming 
code, 


C n = 0.429 C s ; (9) 

that is, when error protection and addressing are employed on the system, the 
net data compression is approximately 43 percent of the sample reduction. With 
the basic terms defined, it is now possible to compare the performance of the 
adaptive sampling system. 

The basis of comparison of the system's performance with that of other 
systems will be a subjective evaluation; however, so that the data can possibly 
be extrapolated to other sources, the rms error will also be used as a basis. In 
general, for the zero-order predictor the peak error is normally specified as 
being an error criterion; however, because of the expansion of errors due to 
channel noise, this cannot be used as a basis of comparison. Since this error- 
expansion factor is quite pronounced for the lower error rates, there is a need 
for another error criterion. As a consequence, the experimental program was 
initiated to investigate the effects of channel noise. 

EXPERIMENTAL PROGRAM 

The objectives of the experimental program were to evaluate subjectively 
the effects of channel noise on PCM and compressed data, to determine experi- 
mentally the rms error as a function of compression, and to determine experi- 
mentally the rms error between the compressed data and PCM data at varying 
channel bit error probabilities. 

Figure 11-1 is a block diagram of the conceptual experiment. A photograph 
was scanned by EDITS (Electro-Mechanical Research's (EMR's) experimental 
digital television system). The digital data then were transferred to EMR's 
ASI-210 digital computer and processed by the computer with various coding 
algorithms, the channel noise was injected into the compressed data, and the 
data then were decoded and transmitted back to EDITS to be displayed and 
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Figure 11-1. Block Diagram of Experimental Setup 



























photographed. A third path existed in the computer: the addition of error 
correction in the form of a (7,4) or (14, 10) Hamming code. The (7, 4) code, 
which was the easiest to implement and the fastest operationally, was used to 
protect only the A x positional bits since an error in position is weighted far 
heavier than an error in intensity. A (14, 10) Hamming code which protected 
both the intensity and the positional information was also implemented on the 
computer. This code has the same net coding efficiency as the (7, 4) code; how- 
ever, its figure of merit (the ratio of the word error probability after coding to 
the word error probability before coding) is not as large as that of the (7, 4) 
code and is an order of magnitude greater in complexity in both implementation 
and encoding time. However, as anticipated, the code performed more efficiently 
than the (7,4) code. 

ERROR PERFORMANCE 

In the past, most investigators have ignored the effects of channel noise on 
PCM data or have argued that with compression the bandwidth is reduced and 
therefore the noise also is reduced. It is obvious that an error in the most sig- 
nificant bit has more of an effect than an error in the least significant bit and, 
of course, all errors are equally weighted. A derivation of the rms error as a 
function of the channel bit error probability is a straightforward matter. The 
rms error as a function of the channel bit error probability (P e b ) and the number 
of quantizing bits N is given by 


rmS PCM 



( 10 ) 


Errors in a video PCM scene occur as a salt-and-pepper effect and are 
constrained to individual samples; however, for the zero-order predictor this is 
not the case. For the zero-order predictor a single-bit error can propagate the 
average sample reduction, it can propagate the entire length of a line and destroy 
all the samples, or it can merely cause a finite percentage of the samples in the 
line to be in error. Assume that Ax is encoded to M bits and the intensity, to N 
bits; therefore, for each nonredundant sample, an (N +M)-bit word is trans- 
mitted. 

Figure 11-2 shows the error-analysis model. State A occurs when the de- 
coder is in perfect synchronization with the encoder; that is, when transmission 
is error free. State B occurs when a single bit error occurs in intensity bits; 
this will cause an error to propagate the average sample reduction and then re- 
turn to state A. State C represents the system being completely out of 
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Figure 11-2. Analysis Model 



synchronization; however, the error propagation is finite since this can be 
described as a sliding synchronization state; that is, an error in the least sig- 
nificant bit of Ax will cause a displacement of ±1 sample for each segment, but 
the remaining elements will still maintain the correct intensity since there 
exists an average number of successive samples of the same intensity equal to 
the sample reduction. Once the system enters state C, it cannot reenter state A 
without additional constraints on the decoder. For the purposes of this analysis 
the decoder returns to state A at the start of each line. State D occurs when an 
error occurs in the most significant bits. Such an error will cause complete 
loss of synchronization with zero probability of returning to state A, and, in 
general, a complete loss of each line occurs. As with state C, state D will return 
to state A at the beginning of each line by external control. 

Assuming that the channel bit error probability is given by P e b , the proba- 
bility of being in state A is (1 - P e b ) and the rms = 0. The probability, to a 
first-order approximation, of being in state B is given by 


Prob( state B) 


N 

N + M 


( 11 ) 


The probability of being in state C or D is the same and is given by 


Prob(state C) - Prob(state D) 


1 f_N_) 

2 \N + M / 


( 12 ) 


The rms error can be considered as an rms error-expansion factor. As- 
suming that the encoded frame is used as a basis of comparison, the rms error 
is given by 


rmS ZOPE 



(MS a + MS B + MS C + MS D ). 


(13) 


Hence, an error in intensity will have an rms value of 


rms 



(14) 
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and it will occur (N/N +M ) percent of the time and will propagate C s samples. 
Thus, 


R m - C 

MS b 1 


/p b 

V-r 


3 l 1 4 N 


(15) 


An error in Ax that will cause the system to enter state C will occur 
M/ [2(N + M)] percent of the time and will cause an rms error 


f p b / 

(l - — 
3 \ 4 N 


(16) 


to be weighted byE (S L ) samples/line times E(D), the expected value of the dis- 
placement. It can be shown that 


E(S l ) =■ 



Nx 

2 ( Nx 

3 

_( N + M)C g _ 

2 (N + M) C g 


1 

+ 6 " 


(17) 


and 


■ /? 


for M , an even integer. 


(18) 


A single bit error in Ax that will place the system in state C will have an 
rms value of 


M 


mS c = 2(M + N) | 


/•V *) 

’l ( 16 M/ 2 - 1)" 

fl 

' Nx "j 2 Nx l) 

V 3 ( 4 nJ 

2 m 3 

l 3 

_(N + M)C g J + 2(N+M)C g + 6 J 


of 


(19) 

An error of Ax that will place the system in state D will have an rms value 


SWT) 


( 20 ) 
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occurring M/ [2(M + N)] percent of the time weighted by a factor E(0- 
It can be shown that this is given by 


E(D 


/ 

(N + M)C g 

2 

(N + M)C x * 

g . ^ 

y 

N/6 

2N + 3 


( 21 ) 


Therefore, the rms error introduced when the system is in state D is given by 


M 


rms T 


2 (M + N ) \ 3 


(-*) 


'(N + M)C g 
y Nf6 


(N + M)C g x x 2 

2N + T 


.( 22 ) 


The total rms error expansion for the zero-order predictor is given by 


rms 


ZOPE 


■/ 


MS a + MSg + MS d + MS C 
4x 


(23) 


This is only part of the total error which is caused by channel noise, and it is 
relative to the encoded frame. The total error is given by this error plus the 
rms error due to the encoding process. 

RMS ERROR DUE TO ENCODING 

The coding algorithm concept introduces an rms error that, because of the 
coding technique itself, is irreducible. For the zero-order predictor with an 
error band of ±K elements, in a single frame of data there are, on the average, 


N xy 

(N + M)C g 


( 24 ) 


samples which by definition are correct. Hence, the number of possible ele- 
ments that could be in error is given by 
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(25) 


xy 1 - 


N 

(N + M) C 


or 1 - [n/(N + M)C g ] percent of the elements may be in error. The expected 
value of the error is given by 


E(e) 


2K 2 + 3K + 1 


(26) 


However, of these elements that may be in error, the expected number that 
could be in error is determined by the amount of redundancy in the frame of data. 
Let PE be defined as the amount of redundancy where, as the error band in- 
creases, the amount of redundancy increases also; therefore, the percentage of 
elements that may be in error is given by 


1 


N 

(N + M)C g 


[l - PE(N)] . 


(27) 


Consequently, the rms error due to the encoding algorithm is given by 


rms 


ZOP 



N "I 

1 

pK 2 + 3K + 1" 

11 - PI? 

(N + M)C g J 

6 

[1 rk(N)J 


(28) 


The total rms error as a function of the channel bit error probability is given by 


rms - rms zop + rms Z0PE . 

These theoretical predictions will now be compared with 
results. 


(29) 

actual experimental 


EXPERIMENTAL RESULTS 

Because of the nature of the source data, both subjective and rms experi- 
mental results will be given. 
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Subjective Results 


Figure 11-3 shows a comparison of the subjective results of a zero-order 
predictor, the 6-bit PCM original, and the linear-approximation coding tech- 
nique. For rather large apertures (error bands), contouring results for the 
zero-order predictor; however, the contouring can be minimized somewhat by 
the linear-approximation technique. The zero-order predictor can be thought 



Linear Approximator 6 Bit PCM 

C s = 7.53 
RMS = 3.53% 


Zero-Order Predictor 


C s 6.05 
RMS = 3.54% 




Figure 11-3. Comparison of Zero-Order Predictor and Linear Approximation Coding Techniques 


of as a run-length (zero-slope) encoder with an error band. The linear approxi- 
mation allows a finite number of slopes with an error band; therefore, it should 
eliminate the contouring effect as the aperture is gradually increased. Figure 
11-4 illustrates the subjective effect of varying the compression for the zero- 
order predictor. 

The effects of channel noise on a zero-order predictor are given in 
Figure 11-5. Here the error band is fixed and the channel bit error probability 
is varied from 10" 4 to 10“ x . It can be seen that the channel noise raises the 
lower limit on the permissible channel noise compared with PCM. Therefore, 
with the channel capacity and transmitter power fixed, it is necessary to add 
error protection to the compressed transmitted data. 

Figures 11-6, 11-7, and 11-8 illustrate the effectiveness of the (14, 10) and 
(7, 4) code at a channel bit error probability of 10" 3 with the compression 
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Figure 1 1-4. 
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variable. Here the effects of channel noise become more pronounced as com- 
pression increases. However, the error protection is quite effective in reducing 
the error expansion. 

RMS Error Results 


As stated earlier, the rms error for the zero-order predictor can be con- 
sidered as consisting of two parts: (1) The irreducible coding error and (2) the 
error caused by the effects of channel noise giving an rms error expansion. 
Figure 11-9 plots the rms error as a function of the gross average compression 
At an error band of ±5 out of 64 possible levels, the rms error is 3.54 percent; 
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Figure 11-5. The Effects of Channel Noise on a Zero-Order Predictor with 
a Sample Reduction of 6:05 and a (14, 10) Hamming Code 

however, the peak error is 7.85 percent. Since the peak error is normally 
specified and the actual rms error is less than one-half peak, it appears that 
peak error is not an especially good criterion in describing the zero-order pre- 
dictor. Furthermore, recalling the subjective photos of Figure 11-3, which 
contain an rms error of 3.54 percent (a high value relative to the quantizing 
noise), it can be seen that subjectively the picture is acceptable. Therefore, it 
appears that neither the rms nor peak errors are acceptable error criteria for 
television data. Also given in Figure 11-9 is Equation 28 which is the analytical 
prediction of the rms error. It might be pointed out that although the rms error 
is not a good error criterion for TV in certain telemetry channels, it will be ac- 
ceptable, and the analytical expression should be valid for telemetry data. 
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Figure 11-6. The Effects of Channel Noise on a Zero-Order Predictor Data Compactor 


Figure 11-10 illustrates the effectiveness of the error-correction code 
along with the equivalent PCM error. The error-correction code, which is a 
monotonically increasing function with a decreasing bit error probability, can 
never reduce the error below the basic rms error of the coding technique. As 
the error rate increases, the rms error and the rms error with protection con- 
verge to approximately 25 percent. Below P e b = 10“ 2 , the effectiveness of the 
error-correcting code decreases, so its usefulness also decreases. 

The analytical prediction based on Equation 29 agrees quite well with the 
experimental results. It appears that there exists an optimum error reduction 
as evidence by the PCM and zero-order predictor with the (14, 10) code having 
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Figure 11-7. The Effects of Channel Noise on a Zero-Order Predictor Data Compactor 


nearly the same rms error at P e b = 10 -2 . In this case, at the same rms error,- 
the zero-order predictor without error protection operates at P e b = 9 x 10~ 4 . 
Hence, Figure 11-11 plots what can be called a bit error improvement factor, 
that is, the ratio of the bit error probability after coding to the bit error proba- 
bility before coding the same rms error. For example, at P e b = 10" 2 with the 
(14, 10) code, for the same rms error without protection, the zero-order pre- 
dictor operates at P e b = 9 x 10~ 4 , or a bit error improvement of 11.1. 

The improvement with coding can also be seen from Figure 11-12 which 
plots the difference between the rms error without and with error protection as 
a function of the channel bit error probability. Here, the error-correcting code's 
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Figure 11-8. The Effects of Channel Noise on a Zero-Order Predictor Data Compactor 




figure of merit is multiplied by the rms error where both curves are a function 
of P e b . The figure of merit of the code is a monotonically increasing function of 
decreasing P e b whereas the rms error for the zero-order predictor is a mono- 
tonically decreasing function to a constant value with decreasing P e b . This 
seems to indicate that there exists an optimum operating channel bit error 
probability for maximum error reduction; however, this error probability is not 
necessarily optimum for minimum rms error. Hence, by plotting the difference 
of Equation 23 and the equation modified by the figure of merit of the error- 
correcting code, one can obtain Figure 11-12, which experimentally gives the 
optimum P e b for maximum rms error reduction. 
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Figure 11-11. Bit Error Probability Improvement for (14, 10) Code Relative to Original 
and Encoded Frames for Zero-Order Predictor (5-0-16) 


In view of Figure 11-12, the data were replotted in Figure 11-13 to give a 
figure of merit for the error-correcting code. Given a minimum acceptable 
rms error before error correction, it is possible to determine the rms error 
after error correction. This can be considered as a figure of merit for the 
error-correcting code with the zero-order predictor. 

Figures 11-14 and 11-15 show how another error criterion could possibly 
be employed. Both rms error and percent similarity for the zero-order pre- 
dictor and linear approximator relative to the encoded form are plotted. Note 
that for the zero-order predictor without channel noise the rms error is 3.54 
percent whereas the percent similarity relative to the original data is 35.2 per- 
cent. It appears that the percent similarity has more meaning than the rms 
error at this particular point. 
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Figure 11-13. RMS Error After Error Correction Versus RMS Error 
Before Error Correction for Zero-Order Prediction (5-0-16) 

CONCLUSIONS 

It can be concluded that under the constraints of fixed transmitter power 
and fixed bandwidth, compression can be used to obtain more information per 
unit time. Furthermore, under these constraints, some form of error protection 
must be employed. The type and form of the error protection will be a com- 
promise between implementation, desired efficiency, and channel characteristics. 
When error protection is employed on the class of techniques, the effectiveness 
of the code is decreased considerably. Sample reduction is not a valid way to 
compare system performance at this class of techniques. The effects of channel 
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Figure 1 1-15. 






noise are more pronounced on compressed systems; therefore, error protection 
is required. When addressing of the nonredundant samples and the redundancy 
of an error-protection code are considered, the net coding compression is re- 
duced more than 50 percent. 

This particular experimental program requires additional effort to verify 
the analytical expressions as applied to the broad category of telemetry data. 
Various addressing techniques, including optimum encoding of the addresses, 
might be investigated as a means of increasing the net compression and the 
effects of channel noise on the self- synchronizing codes to be investigated. 
Although the television source was employed in the analysis, it must be empha- 
sized that, in general, these results will apply to the broad class of adaptive 
sampling telemetry techniques since the television source can be considered to 
be an extremely active telemetry channel. 
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12. ELIMINATING REDUNDANCY IN FIXED FRAME TELEVISION 


L. W. Gardenhire 
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Melbourne, Florida 


INTRODUCTION N 67-27414 

The present author has been appalled by the disregard for the basic infor- 
mation content of original data being processed by various data compressors 
and also dismayed by the lack of practical considerations for what the limiting 
factors really are in areas such as conversion and channel noise. Rarely has 
the problem been related to how many uniformly spaced samples are needed to 
reproduce the original data; this is a necessary first step. The magic term 
"compression ratio" can become anything we want, depending upon the choice of 
the sampling rate. One can make any redundance reduction scheme appear 
better than any other scheme by merely undersampling or oversampling. 

Any digital redundancy reduction scheme that has ever been devised is 
directly related to how the waveform was originally sampled. The tremendous 
amount of redundancy occurs because the data are not Gaussian or stationary. 

The chosen sampling rate is based upon the frequency of the fastest transient 
expected; most of the time these frequencies do not occur, and therefore there 
is much redundancy in the data. The sampling rate must be high, however, in 
order to capture the waveforms in case they do occur. Thus, there is a chance 
to provide a real adaptive system that will not lose data but will reconstruct any 
waveform with a minimum number of samples to any desired accuracy specified 
by the user. 

SPECTRAL CHARACTERISTICS 

There is a direct relationship existing between the spectral characteristics 
of the original waveform that has been sampled and the interpolation error cre- 
ated by whatever interpolation method is used to reconstruct the sampled data. 

This holds for the output of a transducer, response of any order system to a step 
function, reconstruction of one line of TV data, or, for that matter, for the 
average spectral characteristics of an entire TV picture. 

The practical optimum interpolation process for any data that have been 
properly sampled is one order less than the order of the original data. When the 
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data are undersampled, this relationship does not hold and has led many people 
to assume that zero-order redundancy schemes are more effective than higher 
order ones. By the same token, many reports have been written where large 
unrealistic compression ratios have been shown simply by oversampling the data. 
Most of these reports do not show any basis for selecting the sampling rate or 
even mention what it was. When the original data are not sampled fast enough to 
recover them to a given accuracy, it is not realistic to try to reconstruct them 
to a greater accuracy. It is true that slight gain is realized by going to higher 
order interpolation schemes, but the gain is very slight and the interpolation 
error never goes to zero. 

REDUNDANCY REDUCTION 

Redundancy reduction hardware does not need to be complicated. The only 
function of predictors and interpolators is to reconstruct the waveform to a 
given accuracy with a minimum number of samples. In other words, the wave- 
form is sampled fast enough at a uniform rate to assure that the sample falls at 
the proper place on the waveform; then interpolation can be performed between 
these nonredundant samples and checked to see that they are within the desired 
accuracy. Any redundancy reduction scheme reconstructs the data and, in turn, 
produces all of the errors created by the original sampling plus its own recon- 
struction errors. If the data were undersampled in the first place and caused 
aliasing errors or foldover frequencies that do not exist, the redundancy scheme 
cannot tell them from real data and thus transmits them as nonredundant data. 

Because a relationship exists between the spectral characteristics of the 
original waveform and the interpolation process, there is also a relationship to 
the amount of redundancy or to the redundancy curve for any reduction process. 
Since any redundancy reduction scheme is nothing more than a reconstruction or 
interpolation process with a minimum number of samples, it holds that the that 
the best redundancy reduction scheme is also one order less than the order of 
the data. 

STUDY PROGRAM 

Several redundancy reduction studies have been performed on fixed frame 
TV pictures after they have been slowly scanned, and then transmitted at a faster 
rate over analog circuits. These pictures, such as Tiros and Nimbus APT, have 
a large amount of conversion noise, which is the limiting factor in how much re- 
duction is possible. The scanning process itself is rather noisy, particularly for 
slow electronic scanning systems. A 20 db signal-to-noise ratio at the output of 
a Vidicon is considered good. To transmit a six-bit picture with no reduction, 
one needs about 40 db of signal-to-noise-ratio. In order to avoid the conversion 


172 



noise problem as much as possible, a flying spot scanner was used in this study 
to scan high quality photographs. 

The redundancy studies reported here were performed with special Radiation 
Inc., equipment called a "Data Management Analyzer." The name is based upon 
an understanding of the relationships mentioned above which allows it to be used 
to determine much about the characteristics of data (i.e. , what the proper sam- 
pling rate is if the data are analog, or what the interpolation error is if they are 
digital). Its operational speed is equal to two computers with an access time of 
1.3 microseconds. It is a redundancy remover capable of either a zero-order 
prediction or the "fan method" patented by Radiation Inc., which is a first-order 
predictor and interpolator. It also contains a reconstructor that is capable of 
either zero- or first-order reconstruction. 

DIGITAL PICTURE TRANSMISSION 

A great deal has been written about digital picture transmission; however, 
its actual use has been very limited because of the added complexities of both the 
transmitting and receiving station as well as the increased bandwidth require- 
ments. Redundancy reduction techniques have advanced to the point where the 
bandwidth requirements can be greatly reduced. Integrated circuits can cut both 
the cost and complexity. 

When one considers the extreme bandwidth requirements needed for digital 
picture transmission, it is quite obvious that here is one of the best places to 
apply redundancy reduction techniques first. Space probes with limited power 
available and some security requirements make digital TV a must in many 
cases. As discussed earlier, the first and foremost item in any digital picture 
transmission system is to assure that the sampling rate has been chosen to 
provide a resolution equivalent to the basic system capability. If the rate is 
higher than this, system capability is wasted; and, if it is lower, the resolution 
of the system is decreased. 

Although one picture may appear to the eye to be adequately sampled, the 
next picture with the same sampling rate, having more fine detail, will be quite 
obviously undersampled. A picture that is undersampled may also look quite 
good before the redundancy removal process; however, after it has been reduced, 
the undersampling damages the picture much more than an adequately sampled 
picture reduced to the same number of nonredundant samples. This shows the 
importance of having a sample fall at the correct place on the waveform pro- 
duced by scanning. 
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CHOICE OF SAMPLING RATE 


As was the case in PCM telemetry, rules of thumb for selecting sampling 
rates have been generated for digital TV. Just as five samples per cycle of the 
maximum frequency has no basis, having as many samples in a line as there are 
lines in a square picture has no basis. This would hold only in an ideal case 
where the picture was scanned by a square spot that stepped across the picture 
with a spot size equal to the resolution elements. When a square spot is used 
to scan across a picture with maximum resolution elements at a fixed rate, 
triangular waveforms are produced that are very high in harmonic content. 

When the spot is round and smaller than the resolution element, square waves 
are produced, which are even higher in harmonic content. These frequencies, 
when added to and subtracted from the sampling frequency, produce foldover 
frequencies that are very detrimental if the sampling rate is not high enough. 

To show this effect, a standard test pattern was scanned with a 300-line/inch 
scanning system. This analog pattern, as shown in Figure 12-1, has been trans- 
mitted and reconstructed. As can be seen by examining the wedge of converging 
lines, the resolution is not 0.003 inch (300 lines/inch) but more nearly 0.005 inch 
(200 lines/inch). This can be seen by noting where the lines merge. 

By assuming that the analog voltage produced by the scanning process is 
sampled at 250 samples per line, encoded to five bits, and then reconstructed, 
the results are as shown in Figure 12-2. Note the extreme interference going 
out to less than 0,025 inch (50 lines/inch). As the sampling rate is increased, 
the interference patterns move towards the center as seen in Figure 12-3. This 
is an enlargement of only the center circle or 150 lines to the inch. The sampling 
rate has been increased to 750 samples per line. Note the interference patterns 
just inside the 150-line/inch circle. 

When the rate is increased to 1000 samples per line, the fringes move in to 
around 200 lines to the inch as seen in Figure 12-4. One thousand samples per 
line is then the proper sampling rate to maintain the resolution of this particular 
system. 

The rate was also checked by studying the spectral characteristics of sev- 
eral pictures (sampled 1000 samples per line); the rms interpolation error 
varied between 3 and 7 percent. 

TEST PORTRAIT 

Although most of the studies performed have been on cloud cover pictures 
taken by the astronauts, the pictures shown here are more pleasing to look at, 
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Figure 12-2. Digital Test Pattern, 
200 Samples/Line 216,000 Total Samples 


Figure 12-1. Analog Test Pattern, 
300 Lines/Inch 


Figure 12*4. Digital Test Pattern, 
1000 Samples/Line 1,080,000 Total Sample 


Figure 12-3. Digital Test Pattern, 
750 Samples/Line 810,000 Total Samples 


and the effects of the redundancy removal process is much easier to observe on 
pictures that we are used to. 

Figure 12-5 shows a portrait that has been scanned and reproduced using 
analog means on the left and digital means on the right. The original picture 
was scanned with a 0.003 inch spot using a photomultiplier tube. The resolution 
was 300 lines to the inch. The picture size was 2.875 x 3.60 inches. The picture 
is composed of 1080 lines; therefore, the digital picture has 1,080,000 five-bit 
samples or 5,400,000 bits. 



ANALOG PICTURE 

300 LINES/INCH 

1080 LINES - 2.875 INCHES 



DIGITAL PICTURE 

1000 - 5 BIT SAMPLES/LINE 

1.080.000 SAMPLES 

5.400.000 BITS 


Figure 12-5. Comparison of Analog and Digital Picture 


In the original pictures, it is difficult, if not impossible, to tell the two 
apart. Much has been lost in the photographic reproductions and the differences 
are attributed more to contrast variations in the photographic processing than in 
the digital process. 
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RESULTS OF FAN METHOD WITH A FIRST-ORDER 
PREDICTOR AND INTERPOLATOR 

The results of processing the digital picture (Figure 12-6(a) ) by the fan re- 
dundancy reduction method using a tolerance of 3.1 percent, can be seen in 
Figure 12-6(b). The analog voltage from the photomultiplier tube was sampled 
to eight bits, the redundancy removed, and then only the five most significant 
bits transmitted. This was done in order to keep the quantization noise small 
enough not to affect the redundancy reduction process. The 3.1 percent tolerance 
then means that the picture is reproduced to within 8 levels in a full scale of 
256 levels. In actuality, the reproducing device is only capable of something less 
than 16 levels of gray. This then means the tolerance or peak error is within 
one-half gray level of the original, which is less than the resolution of the re- 
producer. The 1,080,000 original samples in the picture have been reduced to 
172,000 total samples or 159 samples per line. This yields a sample reduction 
S r of 6.28:1. 

Some method must be used to determine where the nonredundant samples 
occur; this will require the transmission of additional information. In this case, 
a four-bit run length code was used. The four-bit code following each five-bit 
nonredundant data sample tells how many redundant samples have been dropped 
since the last transmitted sample. If the lengths exceed 16, the sixteenth sample 
is transmitted even though it may be redundant. This means that the bit reduc- 
tion B r is five-ninths of the sample reduction S r , or 3.49:1 in this case. 

Again it is virtually impossible to tell the pictures apart. If it were not for 
conversion noise present on the data to be processed, the tolerance could be ex- 
tended to 1 part in 16 or 6.25 percent, and the two pictures would be almost 
identical. 

RESULTS OF FAN METHOD USING 6.25 PERCENT TOLERANCE 

When the tolerance is extended to 6.25 percent, the results can be seen in 
Figure 12-6(c). Here, the original 1,080,000 samples have been reduced to 
98,000; only an average of 91 samples per line is being transmitted. This gives 
a sample reduction S r of 11.02:1 and bit reduction B r of 6.12:1. The smearing 
effect is due to not reproducing the original waveform; this is largely due to the 
conversion noise present. This is not surprising when one realizes that only 
9 percent of the original samples are being used to reconstruct the picture. 

RESULTS WITH FIVE-BIT TIMING WORD 

When the reduction gets large, the number of times the run length code is 
exceeded becomes greater and greater. There is a break point where a five-bit 


177 




Cd 
LU 


LU 


^ ^ Z 

Z CQ LU m on 

H os ^ u IAJ - - 

rv/ ' ^ C£ — 1 °0 On 

^ ^ 2 hi Q. CN Tf 

^ o ^ 0- ^ vd <r> 

Q. o 1/1 <£ 

- _ o OO II II 

fNJ O 

. Mn m 

ro . — — on CO 


< 

h- 

O 

h“ 

LU 


z cQ^ ffl z 1/1 -- 

ifio LL ! LL jLu L 1 J |L oT: 
' 'a. **. S 


U 

0£ 



178 


Figure 12-6. Reduction By Fan Method 







run length code will take fewer bits than will transmitting more four-bit samples. 
Figure 12-6(d) shows this effect. The tolerance has been extended to 9.4 percent 
or 24 levels in 256. The 1,080,000 samples have now been reduced to 57,000 
nonredundant samples. This means that only 5.3 percent of the original samples 
are used to reconstruct the picture. These 57,000 samples are 10 bits long with 
5 information bits and 5 timing bits. If a four-bit timing word were used, there 
would be 90,000 nine-bit nonredundant samples or 810,000 total bits. The picture 
with five-bit timing words requires only 570,000 total bits. Figure 12-6(d) has 
an average of only 53 samples per line. The sample reduction S r is 18.95:1 and 
bit reduction B r is 9.47:1. The smearing effect is much more pronounced in 
this case; however, the picture is still quite useable. This smearing is again the 
result of the conversion noise present in the digital data. It comes from the re- 
construction of noisy samples that do not reproduce the same waveform as when 
a great many samples are used. Greater reduction can be obtained with less 
smearing, if this noise is reduced, either before or after sampling. 

SPECTRAL CHARACTERISTICS OF TEST PORTRAIT 

As mentioned earlier, the most effective redundancy reduction method is 
one which is one order less than the order of the data. The spectral character- 
istics of this picture were made by using a narrow-band spectrum analyzer and 
averaging the output over the entire picture. The results showed the picture to 
be second order. That is, the spectral response curve of the data broke at about 
100 cycles and decayed at 12 db per octave. 

RESULTS OF STEP METHOD USING A ZERO-ORDER PREDICTOR 

Figure 12-7(a) to 12-7(f) shows the operation of a zero-order (step method) 
redundancy reduction method on the same test picture. Figure 12 -7(b) is for a 
tolerance of 3.1 percent or 8 levels out of 256. The 1,080,000 original samples 
have been reduced to 230,000 nonredundant samples, or 213 samples per line. 

The sample reduction S r is 4.70:1, and bit reduction B r is 2.61:1. The fan 
method required 172,000 samples at this tolerance. 

Figure 12-7(c) is for a tolerance of 6.25 percent or 16 levels out of 256. 

The 1,080,000 original samples have been reduced to 123,000, or 114 nine-bit 
samples per line. The sample reduction S r is 8.78:1, and bit reduction B r is 
4.88:1. The quality of the picture is quite good although false contouring is be- 
ginning to show. When the tolerance is increased so that the nonredundant sam- 
ples are reduced to 100,000, as was the case for the fan method at 6.25 percent, 
the false contouring is quite bad. This can be seen in Figure 12-7(d), where the 
picture has been reduced to 102,000 nonredundant samples using a tolerance of 
9.4 percent. 
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Figure 12-7. Reduction By Step Method 




IMPORTANCE OF PROPER RECONSTRUCTION 

The false contouring referred to above results from large changes in the 
data between nonredundant samples. The fan method does much to break this up 
by simply connecting the nonredundant samples with a straight line rather than 
a step. Figure 12-9 demonstrate the extreme importance of the accuracy to 
which the line is reconstructed. Both pictures were coded to six bits, the re- 
dundancy removed, and then only the four most significant bits were transmitted 
Figure 12-9(b) reconstructs the straight line between nonredundant samples to 
only four bits while 12-9(a) reconstructs to eight bits. This is possible because 
of the operation of the fan method, which is a patented process of Radiation Inc. 
Although the two nonredundant samples are known only to 4 bits, when the divi- 
sion is performed to replace the redundant samples along the straight line, it is 
carried out to 12 binary bits and reconstruction is performed to the 8 most sig- 
nificant bits. The more bits to which this division is carried out, the better the 



FAN METHOD - 3.9 PERCENT TOLERANCE FAN METHOD - 3.9 PERCENT TOLERANCE 
132,000 - 4 BIT SAMPLES 135,000 - 4 BIT SAMPLES 

RECONSTRUCTED TO 8 BITS RECONSTRUCTED TO 4 BITS 


Figure 12-9. Control of False Contouring 
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reconstructed samples will fit the straight line. The only difference between 
Figures 12-9(a) and 12-9(b) is that only four most significant bits of the division 
were used for Figure 12-9(b), while eight were used for Figure 12-9(a). 

A further improvement of this effect could be made by operating the recon- 
struction process at a higher sampling rate than was used for the reduction 
process. In other words, there would be more samples along the straight line, 
and, when filtered by an interpolation filter, the analog output waveform would 
more nearly match the input waveform. 

RESULTS OF HYBRID METHOD 

When a sampled waveform is reconstructed by interpolation with a higher 
order process, it will produce fewer errors, or, if the same error is desired, 
the original sampling rate can be reduced. By the same token, a waveform can 
have the redundancy removed with the zero-order process and be reconstructed 
with a linear interpolation process. This is known as a hybrid redundancy re- 
duction process. Its main advantage is that the reduction equipment can be kept 
simple. It does, however, have several problems that must be considered. In 
certain waveforms, the peak error produced in reconstruction may be twice the 
tolerance; as, for instance, when a waveform almost goes out of tolerance in a 
positive direction and then suddenly goes out of tolerance in a negative direction. 
If a straight line is drawn from the original sample to the sample that is just out 
of tolerance, at the peak which almost went out of tolerance there will be a peak 
error greater than the tolerance. In the TV case where sharp edges appear, 
they may be fuzzy when reconstructed by this method. 

When a four-bit timing word is used, however, neither of these problems is 
likely to cause trouble. The chance of the first problem occurring during a 
maximum of 16 samples is extremely small while the amount that a sharp edge 
will be shifted by this method will be only a small part of the 16 samples maxi- 
mum, which cannot be seen by the eye. 

In addition to the simplicity of the reduction process, the linear interpolation 
process requires a lower sampling rate than a zero-order or step process. 

The main advantage of the hybrid method over a straight step method, when 
used on TV, is that the large steps that produce false contouring are broken up. 
This can be seen by examining Figure 12-10(a). The top half of the picture was 
reduced and reconstructed by the step method while the bottom half was recon- 
structed by a linear interpolation. This is, however, a four-bit picture, which 
produces very bad false contouring. When the same process is carried out on a 
picture of five or more bits, the difference is not very much. Figure 12 -10(b) 
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TOP HALF 
STEP METHOD 
BOTTOM HALF 
HYBRID METHOD 
9.4 PERCENT TOLERANCE 
4 BIT PICTURE 
102,000 SAMPLES 



STEP METHOD 
9.4 PERCENT TOLERANCE 
5 BIT PICTURE 
102,000 SAMPLES 


Figure 12-10. Operation of Hybrid Redundancy Reduction Method 


shows the result of a straight step reduction on a five-bit picture. This again 
shows the importance of always transmitting a five-bit picture, and that the 
hybrid method is not needed when so doing. 

UNIFORM SAMPLING COMPARED WITH REDUNDANT SAMPLES 

As mentioned earlier, any redundancy reduction scheme merely selects the 
minimum number of samples and interpolates between them. It is extremely 
important that there is a sample at the right location on the waveform to allow 
reconstruction to the desired accuracy. The importance of this statement can 
be seen by examining Figure 12-11. Figure 12-ll(a) has been sampled at a 
uniform rate of 189 four-bit samples per line, and reproduced with no further 
processing. Figure 12-ll(b) was sampled 1000 times per line using four-bit 
samples. The redundancy was removed by the fan method at a tolerance of 
6.25 percent or 1 part in 16. This reduced the samples to 94 eight-bit samples 
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UNIFORM SAMPLING 

204.000 - 4 BIT SAMPLES 

816.000 TOTAL BITS 


REDUCED FROM CORRECT SAMPLING RATE 
102 - 8 BIT SAMPLES 
816,000 TOTAL BITS 


Figure 12-11. Importance of Correctly Locating Samples On Waveform 


per line. The two pictures, therefore, have the same total number of bits — 
816,000. Even though the picture on the left has twice as many samples as the 
one on the right, the samples do not fall at the correct location and thus produce 
a combination of ragged lines and false contouring. 

CONCLUDING REMARKS 

The importance of recognizing the basic information content of the data to be 
sampled cannot be overstressed. As has been noted, there has been a complete 
disregard of the problem throughout industry. Although the transmission of a 
waveform is not the final answer to data compression, it is an important first 
step as most telemetry systems in use today rely on it. 

It is also important to note that the real criterion for judging the results of 
redundancy reduction processes on television pictures is not compression ratio, 
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but is the minimum number of bits necessary to transmit a picture of a 
given quality. 

The fact that two different problems exist in transmitting pictures should 
not be overlooked. One occurs when it is important to reduce power and not 
necessarily bandwidth, and the other, where it is important to reduce bandwidth 
and not necessarily power. The first is the most likely in the case of a satellite, 
and the second, for transmission over existing ground circuits. When bandwidth 
is not important, it is possible to operate without a buffer by using an amplitude 
modulation scheme, thus leaving holes in the data. Location of transmitted words 
could be determined by flywheel synchronization on the ground. Thus, the timing 
word could be dropped and reduction greatly improved, and all problems of buffer 
and channel noise in the timing word would be solved. The peak power would be 
high but the average power would be greatly reduced. 

Another important problem studied, but not discussed here, is the effect of 
channel noise on the reduced data. When the redundancy is removed, the re- 
maining samples are much more vulnerable to channel noise. Much study has 
been given to this problem, most of it based on error-correcting, band- spreading 
types of codes. Unless a great deal of band spreading is done and the length of 
the corrected words are long, the gains are small. The most satisfactory method 
appears to be so to arrange the data that when a mistake is made, it will be only 
one level, which will not be very objectionable. Errors in timing are also dis- 
astrous; however, these can be corrected to within reason by using a real-time 
code and correcting it when noise causes it to fall out of sequence. 

Considerations have been given to the design of an airborne system to ac- 
complish what has been demonstrated here. It appears that the system could be 
built, using all integrated circuits, in a box 4 by 4-1/2 by 9 inches or 162 cubic 
inches; this would include a 5000-word buffer. It would weigh about 12 pounds 
and require 7 to 10 watts. It would operate up to 100,000 eight-bit samples per 
second, which would be equivalent to a general purpose computer with a cycle 
time of 0.1 microsecond. This would include the analog-to-digital converter 
but not the scanning system. 

The results shown here are believed to be quite pessimistic. It is believed 
sure that with careful design of the scanning system and attention to the con- 
version noise problem, the amount of reduction and quality of the pictures can 
be improved. 
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EFFECTS OF FALSE CONTOURING 


False contouring is an effect that results from the combination of large 
quantizing levels and the eye's ability to detect very small changes in intensity 
when there are large areas to compare. A large area of slowly varying intensity, 
such as a sky background, is quite common in most pictures. If, for example, a 
density wedge is scanned with a one-bit sample, it will produce two gray levels. 

As the continuous wedge is scanned, there will be a place in the picture where the 
quantization level change will occur in approximately the same place and, in 
turn, produce a false line or contour down the middle of the picture. When the 
sample is increased to two bits, the four quantizing levels will produce three lines 
and four different gray levels. As the number of quantizing levels is increased, 
the number of false lines increases and they become closer and closer together. 
With a five-bit picture or 32 quantizing levels, the lines are close enough to- 
gether that the eye does not notice them. 

COMPARISON OF FOUR-BIT AND FIVE-BIT REDUCED PICTURES 

Although the false contouring of a digital four-bit picture is not particularly 
bad when it has been reduced by any redundancy reduction process the effects 
become much more noticeable. This can be seen in Figure 12-8. Figure 12-8(a) 
has been reduced from a 4-kc, 8-bit picture by using the fan method with 
6.25 percent tolerance. Only the five most significant bits of the 98,000 nonre- 
dundant samples were transmitted and used for reconstruction. Figure 12-8(b) 
has been reduced to 107,000 nonredundant samples with the same tolerance 
where only the four most significant bits were transmitted. This comparison 
shows graphically the need for transmitting a five-bit picture. Increasing beyond 
five-bits does not, however, improve the picture significantly. 

When the step process is used with large tolerances the results are even 
more disastrous. This can be seen in Figures 12-8(c) and 12-8(d). Figure 12-8(c) 
is a five-bit picture reduced to 102,000 nonredundant samples. Note the false 
contouring appearing on the left side of the face and neck. When a four-bit picture 
is reduced as in Figure 12-8(d), the effect is extremely bad. If a five-bit picture 
were reduced to 57,000 nonredundant samples, using the step method and five-bit 
timing word as was done for the fan method, the picture would probably be worse 
than 12-8(d). 

The effect shown is not one of conventional false contouring but a new one 
produced by the large value changes between nonredundant samples. It can be 
controlled by proper reconstruction methods. 
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13. USE OF SAMPLE QUANTILES FOR DATA COMPRESSION 


OF SPACE TELEMETRY* 


I. Eisenberger 

Jet Propulsion Laboratory , CIT 
Pasadena, California 


On the assumption of normal parent populations and a large sample size the 
following uses of sample quantiles for data compression are discussed: (1) Es- 
timating the mean and standard deviation, (2) two goodness-of-fit tests, 

(3) testing the mean of the population, (4) testing the standard deviation, 

(5) testing the mean and standard deviation simultaneously, (6) two-sample tests, 
and (7) tests of independence and estimation of the correlation coefficient. 


*This paper presents the results of one phase of research carried out at the Jet Propulsion 
Laboratory, California Institute of Technology, under Contract No. NAS 7-100, sponsored by 
the National Aeronautics and Space Administration. 
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14. EPSILON - DELTA ENTROPY AND DATA COMPRESSION 


E. C. Posner 

Jet Propulsion Laboratory, CIT 
Pasadena, California 


A unified framework for the theory of data compression is presented. In 
the theory, one starts with a data source which is abstracted into the notion of a 
probabilistic metric space (PMS). Briefly, a PMS is a metric space together 
with a probability distribution subject to certain mild requirements. The metric 
on the space corresponds to a "fidelity criterion;" that is, the distance between 
two points (between two experimental outcomes) is a measure of the loss of 
fidelity if one outcome occurs but the second is assumed to occur. 

The probability distribution on the PMS describes the probability law gov- 
erning the experimental outcomes. The problem of data compression is not to 
observe an outcome at a remote point, but to transmit only certain information 
about the outcome. The transmitted information can be regarded as an indication 
of which set of possible outcomes occurred. 

Thus an epsilon-delta partition of a PMS for positive values of e and S is 
defined as a covering of all or part of the space by disjoint measurable sets 
having a diameter which is at most epsilon. The part of the space not covered 
has a probability of at most delta. The epsilon represents the fidelity required 
and the delta represents the allowable failure probability. For example, an A-D 
converter has a delta representing the off scale probability. 

The data compression system works in the following manner. An outcome 
is observed, and the set of the epsilon-delta partition into which the outcome falls 
is transmitted. The epsilon represents the maximum uncertainty about the out- 
come when it is known into which set the outcome falls. It is felt that this con- 
cept covers all imaginable data compression techniques. 

To define the epsilon-delta entropy of a PMS, one must first define the 
entropy of an epsilon-delta partition. This entropy is merely Shannon's entropy 
of the discrete probability space obtained from the partition by regarding each 
set as a point of the same probability as the set when the probabilities are 
normalized so as to add up to unity. 


191 


The epsilon-delta entropy of the PMS is now defined as the infinitum of the 
entropies of all epsilon-delta partitions of the PMS. This entropy is essentially 
the minimum number of bits necessary to describe outcomes with a precision of 
at least epsilon, when, at most, delta of the outcomes do not have to be handled 
by the data compression system. 

The problems of data compression for a given data source are described as 
follows: First, create a PMS which is a good model of the probability law and 
the fidelity law of the experiment. Second, find out which epsilon is needed for 
the required precision. Third, accept a certain failure probability delta. 

Fourth, find the epsilon-delta entropy, or a good approximation to it. Fifth, 
settle on a certain efficiency that you are willing to accept. (For example, an 
efficiency of one-half means that twice as many bits are being sent as the abso- 
lute minimum given by the epsilon-delta entropy. The efficiency concept is re- 
garded as a better choice than a "data compression ratio" concept.) Sixth, find 
an epsilon-delta partition of the PMS which achieves the required efficiency in a 
reasonably mechanizable fashion. 

This mechanization amounts to finding which partition set an outcome belongs 
in. This could be the most difficult part of data compression. For example, if 
one has a PMS of functions on an interval, the first 20 Fourier coefficients might 
be sufficient to compute and transmit if the distance is given by a square error 
criterion. 

The model presented covers almost any conceivable data transmission prob- 
lem. A mathematical difficulty, however, is that of finding epsilon-delta entropies. 
There are a few results for the case in which the PMS is the space of square in- 
tegrate functions on a closed and bounded interval, with probability induced by 
a Gaussian stochastic process on the interval. This model, however, covers a 
wide range of data sources which produce functions of time. 

In closing, it should be pointed out that the theory of PMS's and their epsilon- 
delta entropies is still under very active development. This work is the result 
of a joint effort of the author and Howard Rumsey, Jr., also of JPL. 
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15. DATA COMPRESSION AND DATA USERS 


D. G. Bourke* 

Jet Propulsion Laboratory, CIT 
Pasadena, California 


Data compression will achieve a reasonable decoupling of the data sources 
from the limitations of the communications channel, but it necessitates much 
closer liaison with the data users. Some of the design constraints and require- 
ments imposed by the data users during the conceptual and preliminary design 
of an engineering data system which incorporated data compression are discussed. 
The continuous interaction with the data users was a great aid in solving the at- 
tendant design problems. 

As a secondary benefit derived from intimate data user liaison, much valu- 
able groundwork was laid for later phases of the system development. 


*Mr. Bourke is now an employee of IBM, Federal Systems Division. 
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16. ADAPTIVE DATA CONTROL AND PROCESSING FOR SCIENTIFIC 
EXPERIMENTS WITH ON-BOARD PLANETARY PROBES 


D. W. Slaughter 

Jet Propulsion Laboratory , CIT 
Pasadena, California 


Criteria are presented which influence the design approach to adaptive 
systems for the acquisition, encoding, processing, and management of scientific 
data onboard interplanetary spacecraft. Emphasis is placed upon system con- 
siderations rather than upon hardware cost limitations, although practicality is 
not ignored. The criteria considered include the scientific objectives of the 
mission, characteristics of the sensors and ancillaries, and the utilization of 
ground commands. 

The techniques which the scientist uses to process and evaluate the received 
flight data, including autocorrelation of the data from several sensors, will be 
evaluated for their effect on the design of onboard systems. The possibility for 
unexpectedness in the scientific phenomena (or deviation from the scientist's 
model) will be considered; also covered are the scientist's anxiety concerning 
the masking or rejection of unexpected data by the onboard systems and the gen- 
eration of false messages, including those generated by failure modes in the 
adaptive system. 
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17. A PRIORI AND A POSTERIORI STUDIES OF ADAPTIVE 
TELEMETRY TECHNIQUES FOR DEEP SPACE MISSIONS 


R. F. Trost 

Jet Propulsion Laboratory , CIT 
Pasadena, California 


During the past several years the Spacecraft Telemetry and Command Sec- 
tion at the Jet Propulsion Laboratory (JPL) has been studying two very general 
classes of spacecraft data, namely, engineering telemetry data and scientific 
video data. Although some a priori studies were performed in each class, most 
studies were done in an a posteriori manner. 

An important result from the a posteriori studies of engineering telemetry 
data was the development of a spacecraft Engineering Data Handling System 
(EDHS). Direct consultations with the data users dictated the system organization 
and characteristics of the EDHS. 

Basically, the EDHS is divided into four subsystems. Each has a transfer 
function which is considered best for the subset of measurements which it 
processes. These subsets are termed operational, operational-performance, 
and performance. They are processed by the FI Absolute Rate Commutator 
subsystem, the F2 Data Compression (with a minimum time-delay feature) sub- 
system, and the F3 Data Compression subsystem, respectively. 

The characteristics of the FI subsystem are quite simple. Measurements 
are sampled at one of two constant rates regardless of changes in the communi- 
cation link bit rate. The constant rates are stored in a small memory and se- 
lected individually via a "user control line" which is sampled simultaneously 
with the user's measurement. Operational measurements processed by the FI 
subsystem are usually those which are highly critical to the success of the 
mission. 

The characteristics of the F2 and F3 subsystems are very similar, with one 
exception. Although both contain data compression and the attendant buffer 
memories, the F2 subsystem also has a minimum time-delay feature which per- 
mits some of the more important data to bypass the buffer and be received in 
real time. It is necessary because many of the operational-performance 
measurements processed by F2 are normally very static; however, sudden 


195 



abnormal changes must be relayed to the receiving station as soon as possible 
if corrective measures are to be initiated. 

In addition to the FI, F2, and F3, there is a fourth subsystem called the 
Master Multiplexer. This is simply a time multiplexer with a programmed 
priority scheme for accepting the outputs from the other three subsystems and 
producing a continuous data bit stream to the modulator. 

Finally, it should be mentioned that a command reprogramming capability 
exists for updating processing parameters for each channel. Also, a provision 
has been made for periodic confidence sampling which produces a complete 
readout of all channels at predetermined times. 

In addition to the engineering data studies, an a posteriori video data 
compression study is also currently in progress at JPL. Television pictures 
from the Ranger and Mariner spacecrafts are being compressed by the use of 
various algorithms for compression and addressing. The compression and re- 
construction is done with a digital computer, and preliminary results of data to 
which no noise has been added is favorable. 

For the selected Ranger IX picture, a typical Net Compression Ratio (NCR), 
which includes addressing information, of about unity was realized using a first- 
order predictor with an aperture of K = 1 (i.e., it is not information destroying). 
For the same aperture, the first-order interpolator yielded an NCR of 1.2. When 
the aperture is increased (K = 2), the first-order predictor yielded an NCR of 
1.4, and the first-order interpolator yielded an NCR of 1.8. 

A typical NCR range for the first eighteen Mariner IV pictures was 1.3 to 
2.4 when they were compressed using the first-order predictor with K = 1. For 
the same aperture, the first-order interpolator yielded an NCR range of 2.5 to 
9.0. When the aperture was increased (K = 2) the respective NCR's were also 
increased by a factor of two or three. 

These topics have been presented to permit others to understand better the 
type of work currently in progress in adaptive telemetry within the Spacecraft 
Telemetry and Command Section of the Telecommunications Division at the Jet 
Propulsion Laboratory. 
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18. ADAPTIVE MODULATION AND DEMODULATION 
FOR TELEMETRY 


E. J. Baghdady 

ADCOM, Inc. 

Cambridge, Massachusetts 


The signal design objectives for the most efficient utilization of available 
primary power on board a spacecraft are outlined. Methods for implementing 
the necessary steps are described, evaluated, and compared. 

For spacecraft/ground communication links having a capacity which varies 
over the course of a mission, some form of data rate adaptation may be desirable. 
This paper considers the problem of preselecting a set of discrete data rates 
optimized under various criteria for the expected channel variation. The role of 
feedback communication (repeat- request) as a data rate adaptation technique is 
also discussed. Additionally, a feedback demodulation technique for the adaptive 
demodulation of FM/FM telemetry is described and evaluated. 
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19. BIT-PLANE ENCODING 


J. W. Schwartz 

Institute for Defense Analysis 
Arlington , Virginia 


and 

R. C. Barker 

Yale University 
New Haven , Connecticut 


Bit-plane encoding, a source encoding technique designed for use with data 
gathered in space probes, is described. The application of the technique to cer- 
tain data gathered by the Explorer XII satellite is discussed.. In addition, the 
general approach to adaptive telemetry taken in the course of the research is 
presented. 

Bit-plane encoding is intended for use aboard spacecraft.*! The encoder 
implementation consists of a memory to store data samples, a monitor, and a 
code box. Sample values in binary form are received sequentially and stored in 
the memory. While the values are being stored, the monitor makes certain 
measurements on each bit plane to determine how each plane is to be treated. 
When a complete group of values has been stored, the encoding procedure begins. 
First some bits are read out of the monitor to indicate the operation which the 
code box will perform on each bit plane; then the monitor controls the readout of 
the memory, one bit plane at a time, and selects the operations to be performed 
by the code box. Both the monitor and the code box perform simple operations 
on binary sequences. 


*Schwartz, J. W., "Data Processing in Scientific Space Probes,” Tech. Note 6, Yale Univ., 
September 1963. 

^Schwartz, J. W., and Barker, R. C., "Bit-Plane Encoding: A Technique for Source Encoding,” 
IEEE Trans, on Aerospace and Electronic Systems, AES-2(4):385-392, July 1966. 
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Bit-plane encoding is especially useful when the data have an amplitude 
spectrum which is concentrated in different ranges in different time intervals. 
With a stored group of 128 samples, bit-plane encoding could be used to describe 
energetic particle counts gathered by Explorer XII with less than 50 percent as 
many bits as were actually used and with no loss of information. The technique 
also conveniently allows certain useful information-destroying operations. 
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20. EXPERIENCES WITH SMALL ADAPTIVE DATA PROCESSORS 


D. H. Schaefer 

Goddard Space Flight Center 
Greenbelt, Maryland. 


Data processors have flown in Goddard's small scientific satellites starting 
with Vanguard III. Two of these processors with adaptive attributes are de- 
scribed. 

The first of these devices is an optical aspect data processor that flew in 
the Atmosphere Explorer satellite. This processor had effectively nine separate 
digital inputs from Sun, Moon, and Earth sensors. It was desired to determine 
which of these had had an input during a frame of telemetry; and, furthermore, 
in which sixteenth of the frame period certain of these inputs had arrived. Only 
eight bits of telemetry per frame were allocated for the transmission of all this 
information. By use of a variable format where the first bit sent determined 
the meaning of all subsequent bits, and by the use of a four-level priority system, 
enough information was transmitted to determine the aspect of the satellite. 

The second device described is an experimental digital tracking device that 
has been designed to "clean up" the noisy signals from rubidium vapor magne- 
tometers so that onboard processing can be accomplished. 

This device operates by sampling the magnetometer signal and its noise. 

The signal is sampled at times determined by a voltage-controlled oscillator. 

To begin with, the device samples the input signal every half cycle of the voltage- 
controlled oscillator. If at the first sample time the sampled voltage is positive, 
at the next time, negative, at the next time, positive, and so forth, the oscillator 
is tracking the signal and no corrections are necessary. Violation of this prime 
rule leads to further tests to determine whether the frequency of the voltage- 
controlled oscillator is high or low. A given number (say, sixteen) of "high" or 
"low" decisions in a row must be made before any correction pulses are given 
to change the frequency of the voltage-controlled oscillator so that noise will not 
initiate incorrect commands. 

The device can also be used to determine if a signal is present in the noise, 
and, if not, to switch the phase of the feedback to the magnetometer so that a 
signal can be produced. The decision that no signal is present is made when an 
approximately equal number of "high" and "low" correction indications are being 
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produced. If this is the case, it is assumed that only noise is present and 
switching of the feedback takes place. 
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21. A DATA MANAGEMENT SIMULATION EXERCISE 


William F. Higgins 

Sylvania Electric Product, Inc. 
Waltham, Massachusetts 


A data management simulation that has been programmed on a general- 
purpose computer is presented. The simulation is aimed at defining the queue 
buffer control requirements for adaptive processing. The input data in the simu- 
lation model contain priority and nonpriority data, but priority data are not com- 
pressed in the simulation. The compressor used in the simulation model for 
nonpriority channels is a zero-order floating aperture predictor; the aperture is 
a function of system control parameters. The control is aimed at compressing 
or deleting nonpriority data in such a way as to maintain a reasonable amount of 
data precision and to make efficient use of the queue buffer. 

The input data are contained in 26 channels. Priority data comprise 10 per- 
cent of the data flow, and nonpriority, 90 percent. The difference between the 
input data flow and the output data flow per unit time is q; this difference is 
accumulated in a buffer. The number of data words in the buffer is called the 
queue, q. The queue buffer control equation is a function of q and q. The buffer 
control has been aimed at preventing the queue from underflowing or overflowing. 

The input data are Gaussion. The initial simulation runs using various con- 
trol functions were generally unsatisfactory. The buffer was used inefficiently 
and control was maintained at the price of data precision. The input-output rate 
of the system is an important parameter in the control function. For a particular 
input-output rate, the aperture on the controlled (nonpriority) channels for steady- 
state operation can be expressed in terms of standard deviations of the input 
data. For an input-output word rate of 10 to 2, the aperture averaged 2.3 standard 
deviations for a simulation run. This was expected and it validated the program. 
To maintain precision, it was necessary to modify the control equation by having 
nonpriority channels deleted in a specified way as a function of the aperture. 

This improved control was paid for by having nonpriority channels occasionally 
deleted. The duty cycle for these channels was a function of the input-output 
rate for a given input sequence. 

The nonpriority channels were modified to have different statistics, and the 
control function had to be tailored to particular sets of channels. Although the 
simulation runs were terminated before the nonstationary data control was fully 
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defined, certain general observations could be made. When there is enough 
correlation between channels of data, these may be processed adaptively as a 
group. However, in general, the control must be tailored to the individual chan- 
nel. One parameter useful in the control of nonstationary channels is the short- 
time averaged standard deviation a T ( t ) . 

The simulation exercise, in general, has emphasized the need to tailor the 
processing and control to the experiments and/or sensors involved. 
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22. ADAPTIVE TELEMETRY - PAST PERFORMANCE AND FUTURE POTENTIAL 

J. Purcell and R. Muller 

Goddard Space Flight Center 
Greenbelt, Maryland 


The telemetry systems that were designed to meet the requirements of 
large scientific satellites are examined. These spacecraft carried many experi- 
ments with varied output data characteristics, and thereby provide good examples 
for missions where all experimental data cannot be processed by some general 
rule. The benefits and problems created by the adaptability in these systems 
are used as a basis to forecast the impact of increased adaptability on future 
spacecraft programs. Examples of onboard data processors now under con- 
sideration are presented. 
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23. SIGNAL POWER CONTROL ACCORDING TO 
MESSAGE INFORMATION CONTENT 


J. J. Metzner 

New York University 
Bronx, New York 


The level of message information contained in satellite telemetry data can 
vary widely with respect to time. It can be highly inefficient to provide at all 
times for the maximum rate of data transmission. Compressive message en- 
coding, a method which improves efficiency, requires substantial buffering 
storage to minimize lost data during the periods of high message information 
content. 

In many space communication applications, the total energy transmitted is 
likely to be far more of a limitation than instantaneous peak power. Methods by 
which transmitted power and the transmitted data rate can be varied according 
to the needs dictated by the message information content are proposed. Such a 
distribution of available energy constitutes an efficient utilization of communi- 
cation resources. Specifically, it can serve either of two purposes: In conjunc- 
tion with compressive coding, it can serve as a replacement for the bulk of the 
buffer storage requirements, as well as improve communication efficiency, or, 
if certain transmitting procedures prove to be practical, it can replace com- 
pressive coding itself as a means of improving efficiency. 

The efficient use of energy is discussed, and it is pointed out that, where the 
bandwidth is of little or no consideration, the total information that can be trans- 
mitted in any time interval is limited only by the total signal energy employed 
during that time. It does not depend upon the uniform distribution of energy 
within the time interval. Where bandwidth is limited, there is some loss due to 
the nonuniform distribution of energy, but the procedure of varying the power 
retains much of its usefulness. 

Several ways for varying the power and data rate are suggested. One in- 
volves changing the number of waveforms in time T, providing two or more 
different modes of operation. A second involves changing the time base; that is, 
in order to send at twice the rate, the waveforms would be squeezed to occupy 
one-half the time while the power was doubled. A third scheme involves the use 
of an auxiliary transmitter only during periods of high information content. 
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Fourth, a method could be envisioned whereby, if a sample value is the same 
(within some tolerance) as its previous value, nothing is sent and (ideally) no 
power is used except for synchronization. When a sample does change, the 
value of the sample is transmitted with full power. This latter method, which is 
applicable only in the unconventional situation where total transmitter power 
utilization in the unmodulated mode is far less than that in the fully modulated 
mode, has the potential of serving as a replacement for compressive coding. 

Other factors of the signal power control are also considered in conjunction 
with the variation according to message information content. They are: control 
by command, by preprogramming according to distance, by message priority, 
and by the current status of the energy reserve. 
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24. CODING PROBLEMS OF ADAPTIVE TELEMETRY 


S. W. Golomb 

University of Southern California 
Los Angeles, California 


There are three completely different approaches to "sophisticated teleme- 
try." The most traditional of these is to seek efficient channel utilization in the 
sense of Shannon, which involves the consideration of the communications link 
as a common carrier to be upgraded by matched filters or error correction or 
other techniques independent of the data being communicated. The second level 
of penetration into the problem is properly called data compression and assumes 
an a priori model for the type of data which will probably be encountered. In 
terms of this model, salient parameters of the data (e.g., histogram levels or quan- 
tiles) are computed at the transmitter terminus, and only these compressed 
data are communicated. The approach with the third degree of ingenuity and 
sophistication is that of designing the data gathering and preprocessing equip- 
ment in such a way that the processing will adapt to the phenomena actually en- 
countered. Only this third-level approach is accurately termed adaptive teleme- 
try, and there are persuasive political-technical reasons why little if any 
adaptive equipment is likely to be tried in space communications in the near 
future. Since "efficient channel utilization" is now generally appreciated, the 
significant research and development emphasis for the next several years 
should be in the data compression area. The coding problems here involve the 
extraction, representation, and protection of the significant portions of the ex- 
perimental data. 
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25. SCHEME FOR ADAPTIVE THRESHOLD DECODING 


J. L. Massey 

University of Notre Dame 
Notre Dame, Indiana 


The additive Gaussian noise channel is a theoretical model which corre- 
sponds closely to the actual channel in space communication systems. The usual 
approach to the problem of reliable data transmission through this channel em- 
ploys binary antipodal signal segments with successive segments chosen in ac- 
cordance with the digits of a binary error-correcting code. The maximal-length 
codes are the class of binary block codes for which the resulting transmitted 
waveforms generate the simplex structure in signal space. The performance of 
these codes on the Gaussian channel has been tabulated by Viterbi*. 

A class of nonblock binary codes, the convolutional uniform codes found by 
the author*, give essentially the same performance as the equivalent-rate, 
maximal-length codes. Moreover, the uniform codes are simpler to encode, an 
important consideration for space vehicles where there is a premium on encoder 
simplicity. The interesting feature of the uniform codes is the ease with which 
the code rate can be changed to match the changing conditions of the signal-to- 
noise ratio as occurs when the distance of a space probe from the Earth in- 
creases. 

For any integer m, there is a binary, uniform code with rate R = 2 _m and a 
minimum distance d = (m + 2)2 m_1 . The same encoding circuit for rate R = 2 -M 
can be used for all rates R = 2 -m , m = 1, 2, . . . M, simply by disabling the 
last M - m stages of the M-stage generator. The fact that the rate is an inverse 
power of two facilitates synchronization of the data bits and the l/R times as 
numerous encoded bits. (This is in contrast with the maximal-length case where 
there are diophantine problems associated with rate changing.) 

Moreover, the uniform codes can be decoded very simply by a threshold 
decoder* which uses to full advantage the likelihood information for the received 


*Viterbi, A., "Phase-Coherent Communication over the Continuous Gaussian Channel,” in Digital 
Communications with Space Applications, S. Golomb, Ed., Prentice-Hall, 1964, pp. 106-134. 

* Massey, J., "Uniform Codes,” to appear in IEEE Trans. Info. Th., IT-12, April, 1966. 

* Massey, J., "Threshold Decoding,” M. I. T. Press, 1963. 
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bits. This permits real-time operation with simple decoding equipment and 
essentially maximum likelihood decoding performance. The decoder can be 
adapted for rate changes in a manner very similar to that for the encoder. It 
can also be shown that, unlike some forms of nonblock decoding, there is no 
danger that a decoding mistake will trigger a long succession of further decoding 
mistakes. This can be important in applications such as the space channel 
where a feedback link to the encoder is not readily available. 



26. AN EMPIRICAL BAYES TECHNIQUE IN DETECTION 
AND ESTIMATION THEORY 


S. Schwartz 

University of Michigan 
Arbor, Michigan 


An empirical Bayes technique is applied to the class of problems in which a 
sequence of information-bearing signals is assumed to be a stationary, random 
process with the underlying probability structure unknown. The observations 
are added to Gaussian noise with a known, nonwhite spectrum. The statistical 
problem is to extract information from each member of the sequence by estima- 
tion or detection, depending on the problem. The Bayes technique involves the 
use of accumulated past observations to obtain consistent estimates of the un- 
known distributions or related quantities. These are used to form a sequence of 
one-stage decision (estimation) procedures which converge to the optimum pro- 
cedure one would use if all pertinent distributions were known. 

This empirical technique differs from other learning procedures and from 
what has been called nonsupervised pattern recognition in a number of respects. 

An important difference is that a statistical dependence on the information-bearing 
signals and on the additive noise is permitted. Second, the signals (at the re- 
ceiver) are taken as random processes as opposed to fixed (but unknown) signals 
buried in the noise. Third, the method of estimating the unknown distributions is 
an improvement over the other methods suggested in the past in that it gives 
faster convergence and simpler implementation. 
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27. ADAPTIVE DATA COMPRESSION TECHNIQUES 


D. R. Weber and C. M. Kortman 

Lockheed Missile and Space Co. 
Sunnyvale, California 


Adaptive telemetry is a term that has gained considerable popularity over 
the past several years. To assist in the definition of this term it is desirable to 
relate adaptivity to three levels of sophistication. The first and simplest form 
of adaptivity is the "adaptable" system. An adaptable process is one which re- 
acts in a rigid manner to one or more outside parameters or commands. The 
second is the "adaptive" system, and is defined as a process which reacts to 
parameters monitored within itself. The third and most sophisticated form of 
adaptivity is the "self-adaptive" system. A self-adaptive process involves 
monitoring, learning, and reaction. In a truly self-adaptive system the reaction 
process is invented as an outcome of the learning process. The majority of 
techniques presently used in data compression fall within the areas of adaptable 
or adaptive processes. Four basic categories of data compression, i.e., parame- 
ter extraction, adaptive sampling, redundancy reduction, and encoding are recog- 
nized. The remainder of the paper is devoted to the application of redundancy 
reduction to video data. 

In measuring the effectiveness of several different redundancy reduction 
algorithms, it is necessary to present bandwidth compression ratio as some 
function of error. Attempts to use peak and rms errors were not successful 
since there was little correlation between these measures of error and the sub- 
jective evaluation of the experimenters. Hence, it was necessary to relate band- 
width compression ratio to the subjective analysis of picture fidelity. The results 
of a study sponsored by the Goddard Space Flight Center showed that Tiros 
satellite pictures can be compressed successfully, resulting in bandwidth com- 
pression ratios of four to one. Compression ratios of five to one were achieved 
on Ranger pictures, and six to one on scanned Gemini photographs. The first- 
order polynomial interpolators were shown to be more effective than predictors 
and the zero-order interpolator. 

The use of an adaptive aperture and adaptive filter techniques as a means of 
buffer control is presented in detail. Both techniques were successful in mini- 
mizing buffer overflow and buffer underflow. However, the adaptive filter con- 
trol technique resulted in greater picture degradation than adaptive aperture. It 
was concluded that the high frequency components in pictorial data are of greater 
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importance for maintaining picture fidelity than are the absolute intensity values. 
The use of an adaptive aperture based upon the measure of recent redundant se- 
quences proved to be of value for improving overall picture fidelity. 

The subject of sampling relative to redundancy reduction was discussed in 
detail during the question and answer period. In summary, sampling and re- 
dundancy reduction should always be considered jointly if the performance of a 
system is to be maximized. In an undersampled system excessive errors will 
result from aliasing. If data are oversampled the overall bandwidth require- 
ments will increase since a redundancy reduction process cannot eliminate all 
redundancies and efficiency will be lost because of the encoding process. 
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